Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, On Mon, 10 Jan 2011, Holger Kiehl wrote: > Hello, > > upgrading kernel on secondary from 2.6.36.2 to 2.6.37 gives me the > following error on primary: > > Jan 10 12:41:57 obelix kernel: block drbd0: BAD! BarrierAck #2350363662 > received, expected #2350363661! > Jan 10 12:41:57 obelix kernel: block drbd0: peer( Secondary -> Unknown ) > conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) > Jan 10 12:41:57 obelix kernel: block drbd0: short read expecting header > on sock: r=-512 > Jan 10 12:41:57 obelix kernel: block drbd0: Creating new current UUID > Jan 10 12:41:57 obelix kernel: block drbd0: asender terminated > Jan 10 12:41:57 obelix kernel: block drbd0: Terminating drbd0_asender > Jan 10 12:41:57 obelix kernel: block drbd0: Connection closed > Jan 10 12:41:57 obelix kernel: block drbd0: conn( ProtocolError -> > Unconnected ) > Jan 10 12:41:57 obelix kernel: block drbd0: receiver terminated > Jan 10 12:41:57 obelix kernel: block drbd0: Restarting drbd0_receiver > Jan 10 12:41:57 obelix kernel: block drbd0: receiver (re)started > Jan 10 12:41:57 obelix kernel: block drbd0: conn( Unconnected -> > WFConnection ) > Jan 10 12:41:57 obelix kernel: block drbd0: Handshake successful: Agreed > network protocol version 95 > Jan 10 12:41:57 obelix kernel: block drbd0: conn( WFConnection -> > WFReportParams ) > Jan 10 12:41:57 obelix kernel: block drbd0: Starting asender thread > (from drbd0_receiver [3233]) > Jan 10 12:41:57 obelix kernel: block drbd0: data-integrity-alg: > <not-used> > Jan 10 12:41:57 obelix kernel: block drbd0: max_segment_size ( = BIO > size ) = 65536 > Jan 10 12:41:57 obelix kernel: block drbd0: drbd_sync_handshake: > Jan 10 12:41:57 obelix kernel: block drbd0: self > 28DDE63A9DEC9869:19CC15BDDB81CF01:8C9904DC3E8DFFD7:F46F8C2F00547891 > bits:500 flags:0 > Jan 10 12:41:57 obelix kernel: block drbd0: peer > 19CC15BDDB81CF00:0000000000000000:8C9904DC3E8DFFD6:F46F8C2F00547891 > bits:0 flags:0 > Jan 10 12:41:57 obelix kernel: block drbd0: uuid_compare()=1 by rule 70 > Jan 10 12:41:57 obelix kernel: block drbd0: peer( Unknown -> Secondary ) > conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) > > Upgrading the primary to 2.6.37 did also not help, it produces > the same errors. I tried this on two different clusters and > always the above error pops up if secondary is 2.6.37. > The same problem still exists when using kernel 2.6.38.1: Mar 25 08:54:20 obelix kernel: block drbd0: BAD! BarrierAck #1861867747 received, expected #1861867746! Mar 25 08:54:20 obelix kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) Mar 25 08:54:20 obelix kernel: block drbd0: process_done_ee() = NOT_OK Mar 25 08:54:20 obelix kernel: block drbd0: asender terminated Mar 25 08:54:20 obelix kernel: block drbd0: Terminating drbd0_asender Mar 25 08:54:20 obelix kernel: block drbd0: short read expecting header on sock: r=-512 Mar 25 08:54:20 obelix kernel: block drbd0: Creating new current UUID Mar 25 08:54:20 obelix kernel: block drbd0: Connection closed Mar 25 08:54:20 obelix kernel: block drbd0: conn( ProtocolError -> Unconnected ) Mar 25 08:54:20 obelix kernel: block drbd0: receiver terminated Mar 25 08:54:20 obelix kernel: block drbd0: Restarting drbd0_receiver Mar 25 08:54:20 obelix kernel: block drbd0: receiver (re)started Mar 25 08:54:20 obelix kernel: block drbd0: conn( Unconnected -> WFConnection ) Mar 25 08:54:20 obelix kernel: block drbd0: Handshake successful: Agreed network protocol version 94 Mar 25 08:54:20 obelix kernel: block drbd0: conn( WFConnection -> WFReportParams ) Mar 25 08:54:20 obelix kernel: block drbd0: Starting asender thread (from drbd0_receiver [3220]) Mar 25 08:54:20 obelix kernel: block drbd0: data-integrity-alg: <not-used> Mar 25 08:54:20 obelix kernel: block drbd0: drbd_sync_handshake: Mar 25 08:54:20 obelix kernel: block drbd0: self 840572B18801AA3B:F99A9CC7F9DDDB47:916E679DA4726603:830351EC828F2F13 bits:191 flags:0 Mar 25 08:54:20 obelix kernel: block drbd0: peer F99A9CC7F9DDDB46:0000000000000000:916E679DA4726602:830351EC828F2F13 bits:0 flags:0 Mar 25 08:54:20 obelix kernel: block drbd0: uuid_compare()=1 by rule 70 Mar 25 08:54:20 obelix kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Mar 25 08:54:20 obelix kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Mar 25 08:54:20 obelix kernel: block drbd0: Began resync as SyncSource (will sync 764 KB [191 bits set]). Mar 25 08:54:21 obelix kernel: block drbd0: Resync done (total 1 sec; paused 0 sec; 764 K/sec) Mar 25 08:54:21 obelix kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) And this then continues frequently: Mar 25 08:53:00 obelix kernel: block drbd0: BAD! BarrierAck #1224296926 received, expected #1224296925! Mar 25 08:54:20 obelix kernel: block drbd0: BAD! BarrierAck #1861867747 received, expected #1861867746! Mar 25 08:54:35 obelix kernel: block drbd0: BAD! BarrierAck #4040326970 received, expected #4040326969! Mar 25 08:56:21 obelix kernel: block drbd0: BAD! BarrierAck #1235958129 received, expected #1235958128! Mar 25 08:57:31 obelix kernel: block drbd0: BAD! BarrierAck #4096191267 received, expected #4096191266! Mar 25 08:58:51 obelix kernel: block drbd0: BAD! BarrierAck #1578973016 received, expected #1578973015! Mar 25 08:59:26 obelix kernel: block drbd0: BAD! BarrierAck #4131468500 received, expected #4131468499! Mar 25 09:00:08 obelix kernel: block drbd0: BAD! BarrierAck #4013314144 received, expected #4013314143! Mar 25 09:01:19 obelix kernel: block drbd0: BAD! BarrierAck #2538005992 received, expected #2538005991! Kernel 2.6.36.x is working without this problem. Any idea what is causing this? What other information is required to solve this issue? Regards, Holger