[DRBD-user] Bad Barrier Ack with 2.6.37

Holger Kiehl Holger.Kiehl at dwd.de
Fri Mar 25 10:07:32 CET 2011


Hello,

On Mon, 10 Jan 2011, Holger Kiehl wrote:

> Hello,
>
> upgrading kernel on secondary from 2.6.36.2 to 2.6.37 gives me the
> following error on primary:
>
>    Jan 10 12:41:57 obelix kernel: block drbd0: BAD! BarrierAck #2350363662
>    received, expected #2350363661!
>    Jan 10 12:41:57 obelix kernel: block drbd0: peer( Secondary -> Unknown )
>    conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
>    Jan 10 12:41:57 obelix kernel: block drbd0: short read expecting header
>    on sock: r=-512
>    Jan 10 12:41:57 obelix kernel: block drbd0: Creating new current UUID
>    Jan 10 12:41:57 obelix kernel: block drbd0: asender terminated
>    Jan 10 12:41:57 obelix kernel: block drbd0: Terminating drbd0_asender
>    Jan 10 12:41:57 obelix kernel: block drbd0: Connection closed
>    Jan 10 12:41:57 obelix kernel: block drbd0: conn( ProtocolError ->
>    Unconnected )
>    Jan 10 12:41:57 obelix kernel: block drbd0: receiver terminated
>    Jan 10 12:41:57 obelix kernel: block drbd0: Restarting drbd0_receiver
>    Jan 10 12:41:57 obelix kernel: block drbd0: receiver (re)started
>    Jan 10 12:41:57 obelix kernel: block drbd0: conn( Unconnected ->
>    WFConnection )
>    Jan 10 12:41:57 obelix kernel: block drbd0: Handshake successful: Agreed
>    network protocol version 95
>    Jan 10 12:41:57 obelix kernel: block drbd0: conn( WFConnection ->
>    WFReportParams )
>    Jan 10 12:41:57 obelix kernel: block drbd0: Starting asender thread
>    (from drbd0_receiver [3233])
>    Jan 10 12:41:57 obelix kernel: block drbd0: data-integrity-alg:
>    <not-used>
>    Jan 10 12:41:57 obelix kernel: block drbd0: max_segment_size ( = BIO
>    size ) = 65536
>    Jan 10 12:41:57 obelix kernel: block drbd0: drbd_sync_handshake:
>    Jan 10 12:41:57 obelix kernel: block drbd0: self
>    28DDE63A9DEC9869:19CC15BDDB81CF01:8C9904DC3E8DFFD7:F46F8C2F00547891
>    bits:500 flags:0
>    Jan 10 12:41:57 obelix kernel: block drbd0: peer
>    19CC15BDDB81CF00:0000000000000000:8C9904DC3E8DFFD6:F46F8C2F00547891
>    bits:0 flags:0
>    Jan 10 12:41:57 obelix kernel: block drbd0: uuid_compare()=1 by rule 70
>    Jan 10 12:41:57 obelix kernel: block drbd0: peer( Unknown -> Secondary )
>    conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
>
> Upgrading the primary to 2.6.37 did also not help, it produces
> the same errors. I tried this on two different clusters and
> always the above error pops up if secondary is 2.6.37.
>
The same problem still exists when using kernel 2.6.38.1:

    Mar 25 08:54:20 obelix kernel: block drbd0: BAD! BarrierAck #1861867747 received, expected #1861867746!
    Mar 25 08:54:20 obelix kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
    Mar 25 08:54:20 obelix kernel: block drbd0: process_done_ee() = NOT_OK
    Mar 25 08:54:20 obelix kernel: block drbd0: asender terminated
    Mar 25 08:54:20 obelix kernel: block drbd0: Terminating drbd0_asender
    Mar 25 08:54:20 obelix kernel: block drbd0: short read expecting header on sock: r=-512
    Mar 25 08:54:20 obelix kernel: block drbd0: Creating new current UUID
    Mar 25 08:54:20 obelix kernel: block drbd0: Connection closed
    Mar 25 08:54:20 obelix kernel: block drbd0: conn( ProtocolError -> Unconnected )
    Mar 25 08:54:20 obelix kernel: block drbd0: receiver terminated
    Mar 25 08:54:20 obelix kernel: block drbd0: Restarting drbd0_receiver
    Mar 25 08:54:20 obelix kernel: block drbd0: receiver (re)started
    Mar 25 08:54:20 obelix kernel: block drbd0: conn( Unconnected -> WFConnection )
    Mar 25 08:54:20 obelix kernel: block drbd0: Handshake successful: Agreed network protocol version 94
    Mar 25 08:54:20 obelix kernel: block drbd0: conn( WFConnection -> WFReportParams )
    Mar 25 08:54:20 obelix kernel: block drbd0: Starting asender thread (from drbd0_receiver [3220])
    Mar 25 08:54:20 obelix kernel: block drbd0: data-integrity-alg: <not-used>
    Mar 25 08:54:20 obelix kernel: block drbd0: drbd_sync_handshake:
    Mar 25 08:54:20 obelix kernel: block drbd0: self 840572B18801AA3B:F99A9CC7F9DDDB47:916E679DA4726603:830351EC828F2F13 bits:191 flags:0
    Mar 25 08:54:20 obelix kernel: block drbd0: peer F99A9CC7F9DDDB46:0000000000000000:916E679DA4726602:830351EC828F2F13 bits:0 flags:0
    Mar 25 08:54:20 obelix kernel: block drbd0: uuid_compare()=1 by rule 70
    Mar 25 08:54:20 obelix kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
    Mar 25 08:54:20 obelix kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
    Mar 25 08:54:20 obelix kernel: block drbd0: Began resync as SyncSource (will sync 764 KB [191 bits set]).
    Mar 25 08:54:21 obelix kernel: block drbd0: Resync done (total 1 sec; paused 0 sec; 764 K/sec)
    Mar 25 08:54:21 obelix kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )

And this then continues frequently:

   Mar 25 08:53:00 obelix kernel: block drbd0: BAD! BarrierAck #1224296926 received, expected #1224296925!
   Mar 25 08:54:20 obelix kernel: block drbd0: BAD! BarrierAck #1861867747 received, expected #1861867746!
   Mar 25 08:54:35 obelix kernel: block drbd0: BAD! BarrierAck #4040326970 received, expected #4040326969!
   Mar 25 08:56:21 obelix kernel: block drbd0: BAD! BarrierAck #1235958129 received, expected #1235958128!
   Mar 25 08:57:31 obelix kernel: block drbd0: BAD! BarrierAck #4096191267 received, expected #4096191266!
   Mar 25 08:58:51 obelix kernel: block drbd0: BAD! BarrierAck #1578973016 received, expected #1578973015!
   Mar 25 08:59:26 obelix kernel: block drbd0: BAD! BarrierAck #4131468500 received, expected #4131468499!
   Mar 25 09:00:08 obelix kernel: block drbd0: BAD! BarrierAck #4013314144 received, expected #4013314143!
   Mar 25 09:01:19 obelix kernel: block drbd0: BAD! BarrierAck #2538005992 received, expected #2538005991!

Kernel 2.6.36.x is working without this problem. Any idea what is causing
this? What other information is required to solve this issue?

Regards,
Holger



More information about the drbd-user mailing list