Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I do futher investigation. 1. All of hardware firmwares are up to date so far but nothing has changed. All of tcp offload features are disabled for all of 4 ethernet controllers. 2. I have created a small script for comparing out-of-sync blocks: ------------------------------------------------------------------------ #!/bin/bash #echo 'Mar 31 10:24:04 virt1 kernel: block drbd0: Out of sync: start=1036171232, size=8 (sectors)' while read line; do if [[ $line =~ Out\ of\ sync:\ start=([0-9]+),\ size=([0-9]+) ]]; then start=${BASH_REMATCH[1]} size=${BASH_REMATCH[2]} echo $start - $size sum1=$(ssh 10.1.2.1 dd iflag=direct if=/dev/drbd0 bs=512 skip=$start count=$size 2>/dev/null < /dev/null | md5sum | awk '{print $1}') sum2=$(ssh 10.1.2.2 dd iflag=direct if=/dev/drbd0 bs=512 skip=$start count=$size 2>/dev/null < /dev/null | md5sum | awk '{print $1}') if [[ $sum1 = $sum2 ]]; then echo OK: $sum1 - $sum2 else echo ERR: $sum1 - $sum2 ssh 10.1.2.1 dd iflag=direct if=/dev/drbd0 bs=512 skip=$start count=$size 2>/dev/null < /dev/null > /tmp/${start}_${size}_1 ssh 10.1.2.2 dd iflag=direct if=/dev/drbd0 bs=512 skip=$start count=$size 2>/dev/null < /dev/null > /tmp/${start}_${size}_2 fi fi done ------------------------------------------------------------------------ Comaring found only couple of matches and a lot of differs 3. Todays out-of-sync blocks are related to VM number 109. I did the following: - turned off this VM - copy logical volume to file: dd if=/dev/drbd-lvm-0/vm-109-disk-1 of=/tmp/vm-109-disk-1 bs=1M - copy logical volume back from file: dd if=/tmp/vm-109-disk-1 of=/dev/drbd-lvm-0/vm-109-disk-1 bs=1M 4. Run comparing script again and the script shows that all blocks are matched (that is very good because I don't need to stop any of dual-master nodes and don't need to have a risk to make a wrong way sync, in the worst case (if both of nodes have VMs with out-of-sync blocks) I can't even do that without loosing data) Next step -> I'll try to remove (physically) one connection from my RR bondning and leave only one of them. And then will wait for new verifying results. Any ideas so far? Regards, Stanislav -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130402/82bd412a/attachment.htm>