[DRBD-user] Uncatchable DRBD out-of-sync issue

Stanislav German-Evtushenko ginermail at gmail.com
Tue Apr 2 08:51:27 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I do futher investigation.

1. All of hardware firmwares are up to date so far but nothing has changed.
All of tcp offload features are disabled for all of 4 ethernet controllers.
2. I have created a small script for comparing out-of-sync blocks:
------------------------------------------------------------------------
#!/bin/bash

#echo 'Mar 31 10:24:04 virt1 kernel: block drbd0: Out of sync:
start=1036171232, size=8 (sectors)'
while read line; do
        if [[ $line =~ Out\ of\ sync:\ start=([0-9]+),\ size=([0-9]+) ]];
then
                start=${BASH_REMATCH[1]}
                size=${BASH_REMATCH[2]}
                echo $start - $size
                sum1=$(ssh 10.1.2.1 dd iflag=direct if=/dev/drbd0 bs=512
skip=$start count=$size 2>/dev/null < /dev/null | md5sum | awk '{print $1}')
                sum2=$(ssh 10.1.2.2 dd iflag=direct if=/dev/drbd0 bs=512
skip=$start count=$size 2>/dev/null < /dev/null | md5sum | awk '{print $1}')
                if [[ $sum1 = $sum2 ]]; then
                        echo OK: $sum1 - $sum2
                else
                        echo ERR: $sum1 - $sum2
                        ssh 10.1.2.1 dd iflag=direct if=/dev/drbd0 bs=512
skip=$start count=$size 2>/dev/null < /dev/null > /tmp/${start}_${size}_1
                        ssh 10.1.2.2 dd iflag=direct if=/dev/drbd0 bs=512
skip=$start count=$size 2>/dev/null < /dev/null > /tmp/${start}_${size}_2
                fi
        fi
done
------------------------------------------------------------------------
Comaring found only couple of matches and a lot of differs
3. Todays out-of-sync blocks are related to VM number 109. I did the
following:
- turned off this VM
- copy logical volume to file:
dd if=/dev/drbd-lvm-0/vm-109-disk-1 of=/tmp/vm-109-disk-1 bs=1M
- copy logical volume back from file:
dd if=/tmp/vm-109-disk-1 of=/dev/drbd-lvm-0/vm-109-disk-1 bs=1M
4. Run comparing script again and the script shows that all blocks are
matched
(that is very good because I don't need to stop any of dual-master nodes
and don't need to have a risk to make a wrong way sync, in the worst case
(if both of nodes have VMs with out-of-sync blocks) I can't even do that
without loosing data)

Next step -> I'll try to remove  (physically) one connection from my RR
bondning and leave only one of them. And then will wait for new verifying
results.

Any ideas so far?

Regards,
Stanislav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130402/82bd412a/attachment.htm>


More information about the drbd-user mailing list