[DRBD-user] md5 sums mismatch

Łukasz Oleś lukaszoles at gmail.com
Mon Jul 25 10:38:53 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


W dniu 18 lipca 2011 12:08 użytkownik Łukasz Oleś
<lukaszoles at gmail.com> napisał:
> W dniu 18 lipca 2011 10:38 użytkownik Felix Frank <ff at mpexnet.de> napisał:
>> Hi,
>>
>> you should keep stuff on-list :-)
>
> yeah, wrong button :)
>
>> On 07/18/2011 10:29 AM, Łukasz Oleś wrote:
>>> W dniu 11 lipca 2011 10:45 użytkownik Felix Frank <ff at mpexnet.de> napisał:
>>>> On 07/11/2011 10:32 AM, Łukasz Oleś wrote:
>>>>> Online verify found 88 4k block out of sync!
>>>>
>>>> 350k out of 3TB?
>>>>
>>>> I'd venture to say this is what can result from unfortunate bit-flips
>>>> somewhere along the way.
>>>>
>>>> Yes, these are bad to have. Yet, as far as I know, few systems can even
>>>> protect you from data corruption happening on the way from RAM to HDD,
>>>> so I disbelieve you're facing a huge problem.
>>>
>>> During last week we have made more tests. Firstly, we repeated the
>>> test and again there where mismatches. What is more interesting
>>> corrupted files were on source volume.
>>
>> Like I wrote - data can be corrupted between your CPU/Mem and HDD. It's
>> really hard to guard against this.
>> I've read that ZFS does have defences, but using it atop DRBD will prove
>> difficult ;-)
>>
>>> Then we repeated test again but files were copied localy(not via
>>> iscsi) and drbd was disconnected. Files were ok.
>>>
>>> Any ideas?
>>
>> Not really. Your iSCSI target cannot likely take the blame, since all it
>> does is feeding data to DRBD. If your Secondary received sound data,
>> that means that the iSCSI target fed sound data to DRBD.
>>
>> DRBD cannot be blamed either, because it does little more than hand down
>> data to your HDD controller.
>>
>> I have to admit that it's strange, however. My "misfortune theory"
>> doesn't really hold if only one machine is affected. How did you make
>> sure you were writing sound data during your final test?
>
> I'm copying files with known md5 sums

After another week of testing it looks like problem is solved :)

Between initiator and target I have Chelsio 10GB ethernet adapters and
probably I got error described here:
http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5086853&brandind=5000008
After updating drivers everything seems to work.

Regards,

-- 
Łukasz Oleś



More information about the drbd-user mailing list