[DRBD-user] File corruption

Todd Denniston Todd.Denniston at ssa.crane.navy.mil
Sat Oct 15 00:15:49 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Jeff Buck wrote:
> 

> On Fri, 2005-10-14 at 17:56 +0530, Amod Sutavane wrote:
> > [TD's summary of Amod: like some other recent posts
> >  he was happly running for a long time and now 
> >  randomly some files are getting corrupted/damaged.]

> Are you sure you're not running a kernel with the bio_clone bug? Is your
> kernel version 2.6.11, 2.6.12 or something close? Your problems sound a
> lot like the same corruption other have seen. The bug is in the kernel,
> but you need something like drbd, lvm, or md to poke it for you.
> 

Or could his problems be related to SATA? 

I have been re-reading the recent list messages related to the file
corruptions, to gauge the likely stability of a new DRBD+kernel on my
current hardware[4], because I am getting ready to update my production
servers from Fedora Core 1 (FC1) with DRBD 0.6.13 to FC4 with DRBD 0.7.X.

[If I get something below incorrect, someone PLEASE correct me, I need to
know if it is (or is not) wise to upgrade now.]


If I have been reading this list correctly the folks who have been having
the most problems recently are running a combination of:
Linux-2.6.[9-12] + SMP + SATA + DRBD-0.7.[10-11|13]

Stephan Rattai indicated "The bio_clone bug was introduced in kernel version
2.6.11-rc2 ... and fixed in 2.6.12.4" and Lars Ellenberg let us know 'drbd
uses them [bio clones] all the time', so most problems on systems with the
bio_clone bug can and should be attributed to it.

We know the problems found on 2.6.10(Eugene Crosser's system) is not related
to bio_clone on one system, it may be on another as it was running
2.6.11.12.

bro <zxr at lanparty.lv> started out on 2.6.11.11 so his could have been
bio_clone related, but he upgraded to 2.6.12.5 and then to 2.6.13.2 and was
still seeing problems even after fsck'ing (have not heard from him since 3
Oct though)

Even Lars indicated that they (Linbit) were seeing a problem with a machine
which used SATA drives as SATA, but it worked ok if they switched to a card
that presented the same SATA drives on a 'real' SCSI interface.

Could it be that DRBD causes the block device 'driver' in the kernel[1] to
issue, some SCSI command that either the SCSI"emulation"->SATA [2] or the
SATA device does not handle the same as a real SCSI device would?

So the summary question:
Should I be ok on a system with only PATA and SCSI drives using kernel
2.6.13 and DRBD 0.7.13?



[1] when the block device driver thinks it is dealing with SCSI, as is the
case now with SATA.

[2] I am saying "emulation", because I am not sure if the SCSI commands are
translated or passed directly over the SATA bus. the linuxmafia.com sata
page[3] seems to confirm the translated commands view. 

[3] http://linuxmafia.com/faq/Hardware/sata.html
'' "libata": This is the newer ATA driver set for selected SATA chipsets
only, maintained by Jeff Garzik, leveraging the kernel's well-tested SCSI
layer. ''

[4] 
Dell-PowerEdge 650
1GB Ram.
cpu model name      : Intel(R) Pentium(R) 4 CPU 2.40GHz

SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01)

Promise UltraTrack RM8000 SCSI U160 array
[RM8000 I know has it's own problems, but I know how to handle those]

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane) 
Harnessing the Power of Technology for the Warfighter



More information about the drbd-user mailing list