[DRBD-user] BAD! BarrerACK #... received, #... expected

Paolo Quinci quinci at blautoene.at
Sun Jun 7 17:19:41 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Dear all!
I configured two nodes with corosync and DRBD 8.4 on Ubuntu Server 14.04.02 LTS. After I updated the Kernel on both nodes the nodes cannot sync anymore. I tried ​to downgrade to the lastest still working kernel without success. The sync process starts and ends with a message saying "BAD! BarrierACK received #432, expected #431". I looked up the error but it seems to be an error not many run into. Now I really don't know what to do anymore. Maybe someone of you can help me. The two nodes now work in a production environment, so starting from scratch is not an option I have to add. Backups of the data on the DRBD device are being made on a daily basis though.
My setup:Kernel: 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015 x86_64 x86_64 x86_64 GNU/LinuxDRBD:DRBDADM_BUILDTAG=GIT-hash:\ 599f286440bd633d15d5ff985204aff4bccffadd\ build\ by\ phil at fat-tyre\,\ 2013-10-11\ 16:42:48DRBDADM_API_VERSION=1DRBD_KERNEL_VERSION_CODE=0x080403DRBDADM_VERSION_CODE=0x080404DRBDADM_VERSION=8.4.4
I use boding for the network interfaces (on each node) so I have a fail-over.The DRBD devices are on RAID-5 software RAID block devices.As of now my sync fails regularly at around 4-6% into the sync process.
dmesg tells me:[317758.655502] d-con AxigenData: receiver terminated[317758.655504] d-con AxigenData: Restarting receiver thread[317758.655506] d-con AxigenData: receiver (re)started[317758.655521] d-con AxigenData: conn( Unconnected -> WFConnection ) [317759.154195] d-con AxigenData: Handshake successful: Agreed network protocol version 101[317759.154495] d-con AxigenData: Peer authenticated using 20 bytes HMAC[317759.154669] d-con AxigenData: conn( WFConnection -> WFReportParams ) [317759.154673] d-con AxigenData: Starting asender thread (from drbd_r_AxigenDa [2421])[317759.191027] block drbd0: drbd_sync_handshake:[317759.191033] block drbd0: self 10FC6F75510DED77:D6EA27F99D183685:D6E927F99D183685:D6E827F99D183685 bits:13231595 flags:0[317759.191037] block drbd0: peer D6EA27F99D183684:0000000000000000:26E9BD2F810F6A44:26E8BD2F810F6A45 bits:13231245 flags:0[317759.191041] block drbd0: uuid_compare()=1 by rule 70[317759.191043] block drbd0: Becoming sync source due to disk states.[317759.191052] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) [317759.242168] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 120(1), total 120; compression: 100.0%[317759.290871] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 120(1), total 120; compression: 100.0%[317759.290880] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0[317759.292274] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)[317759.292296] block drbd0: conn( WFBitMapS -> SyncSource ) [317759.292306] block drbd0: Began resync as SyncSource (will sync 52926408 KB [13231602 bits set]).[317759.292367] block drbd0: updated sync UUID 10FC6F75510DED77:D6EB27F99D183685:D6EA27F99D183685:D6E927F99D183685[317893.125154] d-con AxigenData: BAD! BarrierAck #519879 received, expected #519878![317893.143636] d-con AxigenData: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError ) [317893.143670] d-con AxigenData: asender terminated[317893.143674] d-con AxigenData: Terminating drbd_a_AxigenDa[317893.278891] d-con AxigenData: Connection closed[317893.278928] d-con AxigenData: conn( ProtocolError -> Unconnected ) [317893.278930] d-con AxigenData: receiver terminated
Thing is, the error message the counters for the error message: "BAD!...." change every time the sync process is terminated. I fsck-ed the ext4-partition on my UpToDate node and it's clean, smartctl tells me, my disks are OK as well. If you need more information please let me know, more than happy to provide any logs and details you might need in order to help me.
Thank you very much
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150607/6ea5e36b/attachment.htm>

More information about the drbd-user mailing list