[DRBD-user] strange checksum error

Csurai Akos akos.csurai at ericsson.com
Wed Aug 1 10:02:15 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

We have experienced a strange replication problem since we use B protocol.
The scenario is the following:

Some binary files are saved to the replicated IO pair ( kernel:3.0.13, 
drbd-8.3.12, protocol B, EXT3 )
Later they are copied to an other (but replicated) directory.
They are still consistent and there is no problem till the io1 (the 
actual Primary) is rebooted.
Strange it needs a reboot. An enforced role change does not show the 
symptom.
io2 takes the Primary role and when the cluster starts using the binary 
files they show checksum error.

We have turned of the write cache in the sas disks ( sdparam --set WCE=0 
/dev/sda )
and the symptom seemed to be disappeared, but later it surfaced again.
Those corrupted binary files has some 40 kbytes hole filled with zeros.
Yes it can be a HW issue, but we did not see it with C protocol
(which is deadly slow in our system unfortunately)

Have someone seen something similar ?

Thanks,
Akos






More information about the drbd-user mailing list