Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
On Mon, 10 Jan 2011, Holger Kiehl wrote:
> Hello,
>
> upgrading kernel on secondary from 2.6.36.2 to 2.6.37 gives me the
> following error on primary:
>
> Jan 10 12:41:57 obelix kernel: block drbd0: BAD! BarrierAck #2350363662
> received, expected #2350363661!
> Jan 10 12:41:57 obelix kernel: block drbd0: peer( Secondary -> Unknown )
> conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
> Jan 10 12:41:57 obelix kernel: block drbd0: short read expecting header
> on sock: r=-512
> Jan 10 12:41:57 obelix kernel: block drbd0: Creating new current UUID
> Jan 10 12:41:57 obelix kernel: block drbd0: asender terminated
> Jan 10 12:41:57 obelix kernel: block drbd0: Terminating drbd0_asender
> Jan 10 12:41:57 obelix kernel: block drbd0: Connection closed
> Jan 10 12:41:57 obelix kernel: block drbd0: conn( ProtocolError ->
> Unconnected )
> Jan 10 12:41:57 obelix kernel: block drbd0: receiver terminated
> Jan 10 12:41:57 obelix kernel: block drbd0: Restarting drbd0_receiver
> Jan 10 12:41:57 obelix kernel: block drbd0: receiver (re)started
> Jan 10 12:41:57 obelix kernel: block drbd0: conn( Unconnected ->
> WFConnection )
> Jan 10 12:41:57 obelix kernel: block drbd0: Handshake successful: Agreed
> network protocol version 95
> Jan 10 12:41:57 obelix kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
> Jan 10 12:41:57 obelix kernel: block drbd0: Starting asender thread
> (from drbd0_receiver [3233])
> Jan 10 12:41:57 obelix kernel: block drbd0: data-integrity-alg:
> <not-used>
> Jan 10 12:41:57 obelix kernel: block drbd0: max_segment_size ( = BIO
> size ) = 65536
> Jan 10 12:41:57 obelix kernel: block drbd0: drbd_sync_handshake:
> Jan 10 12:41:57 obelix kernel: block drbd0: self
> 28DDE63A9DEC9869:19CC15BDDB81CF01:8C9904DC3E8DFFD7:F46F8C2F00547891
> bits:500 flags:0
> Jan 10 12:41:57 obelix kernel: block drbd0: peer
> 19CC15BDDB81CF00:0000000000000000:8C9904DC3E8DFFD6:F46F8C2F00547891
> bits:0 flags:0
> Jan 10 12:41:57 obelix kernel: block drbd0: uuid_compare()=1 by rule 70
> Jan 10 12:41:57 obelix kernel: block drbd0: peer( Unknown -> Secondary )
> conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
>
> Upgrading the primary to 2.6.37 did also not help, it produces
> the same errors. I tried this on two different clusters and
> always the above error pops up if secondary is 2.6.37.
>
The same problem still exists when using kernel 2.6.38.1:
Mar 25 08:54:20 obelix kernel: block drbd0: BAD! BarrierAck #1861867747 received, expected #1861867746!
Mar 25 08:54:20 obelix kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
Mar 25 08:54:20 obelix kernel: block drbd0: process_done_ee() = NOT_OK
Mar 25 08:54:20 obelix kernel: block drbd0: asender terminated
Mar 25 08:54:20 obelix kernel: block drbd0: Terminating drbd0_asender
Mar 25 08:54:20 obelix kernel: block drbd0: short read expecting header on sock: r=-512
Mar 25 08:54:20 obelix kernel: block drbd0: Creating new current UUID
Mar 25 08:54:20 obelix kernel: block drbd0: Connection closed
Mar 25 08:54:20 obelix kernel: block drbd0: conn( ProtocolError -> Unconnected )
Mar 25 08:54:20 obelix kernel: block drbd0: receiver terminated
Mar 25 08:54:20 obelix kernel: block drbd0: Restarting drbd0_receiver
Mar 25 08:54:20 obelix kernel: block drbd0: receiver (re)started
Mar 25 08:54:20 obelix kernel: block drbd0: conn( Unconnected -> WFConnection )
Mar 25 08:54:20 obelix kernel: block drbd0: Handshake successful: Agreed network protocol version 94
Mar 25 08:54:20 obelix kernel: block drbd0: conn( WFConnection -> WFReportParams )
Mar 25 08:54:20 obelix kernel: block drbd0: Starting asender thread (from drbd0_receiver [3220])
Mar 25 08:54:20 obelix kernel: block drbd0: data-integrity-alg: <not-used>
Mar 25 08:54:20 obelix kernel: block drbd0: drbd_sync_handshake:
Mar 25 08:54:20 obelix kernel: block drbd0: self 840572B18801AA3B:F99A9CC7F9DDDB47:916E679DA4726603:830351EC828F2F13 bits:191 flags:0
Mar 25 08:54:20 obelix kernel: block drbd0: peer F99A9CC7F9DDDB46:0000000000000000:916E679DA4726602:830351EC828F2F13 bits:0 flags:0
Mar 25 08:54:20 obelix kernel: block drbd0: uuid_compare()=1 by rule 70
Mar 25 08:54:20 obelix kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Mar 25 08:54:20 obelix kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
Mar 25 08:54:20 obelix kernel: block drbd0: Began resync as SyncSource (will sync 764 KB [191 bits set]).
Mar 25 08:54:21 obelix kernel: block drbd0: Resync done (total 1 sec; paused 0 sec; 764 K/sec)
Mar 25 08:54:21 obelix kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
And this then continues frequently:
Mar 25 08:53:00 obelix kernel: block drbd0: BAD! BarrierAck #1224296926 received, expected #1224296925!
Mar 25 08:54:20 obelix kernel: block drbd0: BAD! BarrierAck #1861867747 received, expected #1861867746!
Mar 25 08:54:35 obelix kernel: block drbd0: BAD! BarrierAck #4040326970 received, expected #4040326969!
Mar 25 08:56:21 obelix kernel: block drbd0: BAD! BarrierAck #1235958129 received, expected #1235958128!
Mar 25 08:57:31 obelix kernel: block drbd0: BAD! BarrierAck #4096191267 received, expected #4096191266!
Mar 25 08:58:51 obelix kernel: block drbd0: BAD! BarrierAck #1578973016 received, expected #1578973015!
Mar 25 08:59:26 obelix kernel: block drbd0: BAD! BarrierAck #4131468500 received, expected #4131468499!
Mar 25 09:00:08 obelix kernel: block drbd0: BAD! BarrierAck #4013314144 received, expected #4013314143!
Mar 25 09:01:19 obelix kernel: block drbd0: BAD! BarrierAck #2538005992 received, expected #2538005991!
Kernel 2.6.36.x is working without this problem. Any idea what is causing
this? What other information is required to solve this issue?
Regards,
Holger