<html dir=""><head><style id="axi-htmleditor-style" type="text/css">p { margin: 0px; }</style></head><body style="font-size: 10pt; font-family: Arial; background-image: none; background-repeat: repeat; background-attachment: fixed;"><div><style id="axi-htmleditor-style" type="text/css">p { margin: 0px; }</style><font face="Arial"><span style="font-size: 10pt;">Dear all!</span></font><div style="font-family: Arial; font-size: 10pt;"><br></div><div style="font-family: Arial; font-size: 10pt;">I configured two nodes with corosync and DRBD 8.4 on Ubuntu Server 14.04.02 LTS. After I updated the Kernel on both nodes the nodes cannot sync anymore. I tried to downgrade to the lastest still working kernel without success. The sync process starts and ends with a message saying "BAD! BarrierACK received #432, expected #431". I looked up the error but it seems to be an error not many run into. Now I really don't know what to do anymore. Maybe someone of you can help me. The two nodes now work in a production environment, so starting from scratch is not an option I have to add. Backups of the data on the DRBD device are being made on a daily basis though.</div><div style="font-family: Arial; font-size: 10pt;"><br></div><div style="font-family: Arial; font-size: 10pt;">My setup:</div><div style="font-family: Arial; font-size: 10pt;">Kernel: 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux</div><div style="font-family: Arial; font-size: 10pt;">DRBD:</div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">DRBDADM_BUILDTAG=GIT-hash:\ 599f286440bd633d15d5ff985204aff4bccffadd\ build\ by\ phil@fat-tyre\,\ 2013-10-11\ 16:42:48</span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">DRBDADM_API_VERSION=1</span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">DRBD_KERNEL_VERSION_CODE=0x080403</span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">DRBDADM_VERSION_CODE=0x080404</span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">DRBDADM_VERSION=8.4.4</span></font></div></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;"><br></span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">I use boding for the network interfaces (on each node) so I have a fail-over.</span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">The DRBD devices are on RAID-5 software RAID block devices.</span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">As of now my sync fails regularly at around 4-6% into the sync process.</span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;"><br></span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;">dmesg tells me:</span></font></div><div><font face="Arial"><span style="font-size: 13.3333330154419px;"><div>[317758.655502] d-con AxigenData: receiver terminated</div><div>[317758.655504] d-con AxigenData: Restarting receiver thread</div><div>[317758.655506] d-con AxigenData: receiver (re)started</div><div>[317758.655521] d-con AxigenData: conn( Unconnected -> WFConnection ) </div><div>[317759.154195] d-con AxigenData: Handshake successful: Agreed network protocol version 101</div><div>[317759.154495] d-con AxigenData: Peer authenticated using 20 bytes HMAC</div><div>[317759.154669] d-con AxigenData: conn( WFConnection -> WFReportParams ) </div><div>[317759.154673] d-con AxigenData: Starting asender thread (from drbd_r_AxigenDa [2421])</div><div>[317759.191027] block drbd0: drbd_sync_handshake:</div><div>[317759.191033] block drbd0: self 10FC6F75510DED77:D6EA27F99D183685:D6E927F99D183685:D6E827F99D183685 bits:13231595 flags:0</div><div>[317759.191037] block drbd0: peer D6EA27F99D183684:0000000000000000:26E9BD2F810F6A44:26E8BD2F810F6A45 bits:13231245 flags:0</div><div>[317759.191041] block drbd0: uuid_compare()=1 by rule 70</div><div>[317759.191043] block drbd0: Becoming sync source due to disk states.</div><div>[317759.191052] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) </div><div>[317759.242168] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 120(1), total 120; compression: 100.0%</div><div>[317759.290871] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 120(1), total 120; compression: 100.0%</div><div>[317759.290880] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0</div><div>[317759.292274] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)</div><div>[317759.292296] block drbd0: conn( WFBitMapS -> SyncSource ) </div><div>[317759.292306] block drbd0: Began resync as SyncSource (will sync 52926408 KB [13231602 bits set]).</div><div>[317759.292367] block drbd0: updated sync UUID 10FC6F75510DED77:D6EB27F99D183685:D6EA27F99D183685:D6E927F99D183685</div><div>[317893.125154] d-con AxigenData: BAD! BarrierAck #519879 received, expected #519878!</div><div>[317893.143636] d-con AxigenData: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError ) </div><div>[317893.143670] d-con AxigenData: asender terminated</div><div>[317893.143674] d-con AxigenData: Terminating drbd_a_AxigenDa</div><div>[317893.278891] d-con AxigenData: Connection closed</div><div>[317893.278928] d-con AxigenData: conn( ProtocolError -> Unconnected ) </div><div>[317893.278930] d-con AxigenData: receiver terminated</div><div><br></div><div>Thing is, the error message the counters for the error message: "BAD!...." change every time the sync process is terminated. I fsck-ed the ext4-partition on my UpToDate node and it's clean, smartctl tells me, my disks are OK as well. If you need more information please let me know, more than happy to provide any logs and details you might need in order to help me.</div><div><br></div><div>Thank you very much</div><div><br></div><div>Paolo</div></span></font></div></body></html>