[DRBD-user] State switches, flaky connection?

Jarno Elonen jarno.elonen at housemarque.fi
Fri Aug 17 12:27:17 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> I'm running DRBD 8.3.13 on Debian Wheezy, Linux 3.2.20 and
> every now and then my DRBD resources spontaneously switch from
> cs:Connected to cs:WFConnection or the various syncing states and back
> (according to "watch cat /proc/drbd").
>
> I've sometimes seen "broken pipe" or even "protocol error"(!?) flashing
> by briefly.

No luck debugging this so far. I've tried changing network cards, 
switching between bonding modes, reverting back to regular ethX (instead 
of bonding), various MTU and txqueuelen values, using 
resource-only-fencing (corosync) and not. Nothing has helped so far - 
this connection unstability just seems to come and go.

Any better debugging ideas? Or maybe this is not a network issue at all?
Excerpt from DRBD configuration:

         net {
                 timeout 20;
                 max-epoch-size  8192;
                 max-buffers     128k;
                 connect-int     2;
                 ping-int        2;
                 sndbuf-size     10M;
                 rcvbuf-size     10M;
                 ko-count        5;
                 after-sb-0pri   discard-zero-changes;
                 after-sb-1pri   discard-secondary;
                 ping-timeout    2;
         }

         syncer {
                 rate    100M;
                 al-extents      3389;
                 csums-alg       crc32c;
                 verify-alg      crc32c;
         }


Here's a syslog snippet demonstrating one whole cycle of this behavior:

kernel: [ 9827.966027] block drbd6: conn( SyncTarget -> Connected ) 
disk( Inconsistent -> UpToDate )
kernel: [ 9828.199039] block drbd6: helper command: /sbin/drbdadm 
after-resync-target minor-6
crm-unfence-peer.sh[24132]: invoked for drbd-serv-mail
crm-unfence-peer.sh[24132]: WARNING drbd-fencing could not determine the 
master id of drbd resource drbd-serv-mail
kernel: [ 9828.238394] block drbd6: helper command: /sbin/drbdadm 
after-resync-target minor-6 exit code 1 (0x100)
kernel: [ 9828.298906] block drbd6: bitmap WRITE of 83 pages took 15 jiffies
kernel: [ 9828.503024] block drbd6: 0 KB (0 bits) marked out-of-sync by 
on disk bit-map.
kernel: [ 9831.788745] block drbd6: magic?? on data m: 0xa0816800 c: 
5120 l: 0
kernel: [ 9831.788790] block drbd6: peer( Primary -> Unknown ) conn( 
Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
kernel: [ 9831.789573] block drbd6: asender terminated
kernel: [ 9831.789576] block drbd6: Terminating drbd6_asender
kernel: [ 9832.041526] block drbd6: Connection closed
kernel: [ 9832.041531] block drbd6: conn( ProtocolError -> Unconnected )
kernel: [ 9832.041535] block drbd6: receiver terminated
kernel: [ 9832.041537] block drbd6: Restarting drbd6_receiver
kernel: [ 9832.041539] block drbd6: receiver (re)started
kernel: [ 9832.041542] block drbd6: conn( Unconnected -> WFConnection )
kernel: [ 9832.457266] block drbd6: Handshake successful: Agreed network 
protocol version 96
kernel: [ 9832.457276] block drbd6: conn( WFConnection -> WFReportParams )
kernel: [ 9832.457357] block drbd6: Starting asender thread (from 
drbd6_receiver [29943])
kernel: [ 9832.457733] block drbd6: data-integrity-alg: <not-used>
kernel: [ 9832.457745] block drbd6: drbd_sync_handshake:
kernel: [ 9832.457748] block drbd6: self 
E8E3BDC352C4C580:0000000000000000:71C7A5DE96C51226:71C6A5DE96C51227 
bits:0 flags:0
kernel: [ 9832.457751] block drbd6: peer 
E915DF859DCA76C9:E8E3BDC352C4C581:71C7A5DE96C51227:71C6A5DE96C51227 
bits:12 flags:0
kernel: [ 9832.457754] block drbd6: uuid_compare()=-1 by rule 50
kernel: [ 9832.457758] block drbd6: peer( Unknown -> Primary ) conn( 
WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( 
DUnknown -> UpToDate )
kernel: [ 9832.883300] block drbd6: conn( WFBitMapT -> WFSyncUUID )
kernel: [ 9832.987097] block drbd6: updated sync uuid 
E8E4BDC352C4C580:0000000000000000:71C7A5DE96C51226:71C6A5DE96C51227
kernel: [ 9833.141291] block drbd6: helper command: /sbin/drbdadm 
before-resync-target minor-6
kernel: [ 9833.158129] block drbd6: helper command: /sbin/drbdadm 
before-resync-target minor-6 exit code 0 (0x0)
kernel: [ 9833.158135] block drbd6: conn( WFSyncUUID -> SyncTarget ) 
disk( Outdated -> Inconsistent )
kernel: [ 9833.158141] block drbd6: Began resync as SyncTarget (will 
sync 52 KB [13 bits set]).
kernel: [ 9833.415551] block drbd6: Resync done (total 1 sec; paused 0 
sec; 52 K/sec)
kernel: [ 9833.415554] block drbd6: 23 % had equal checksums, 
eliminated: 12K; transferred 40K total 52K
kernel: [ 9833.415558] block drbd6: updated UUIDs 
E915DF859DCA76C8:0000000000000000:E8E4BDC352C4C580:E8E3BDC352C4C581
kernel: [ 9833.415563] block drbd6: conn( SyncTarget -> Connected ) 
disk( Inconsistent -> UpToDate )
kernel: [ 9833.575311] block drbd6: helper command: /sbin/drbdadm 
after-resync-target minor-6
crm-unfence-peer.sh[24433]: invoked for drbd-serv-mail
crm-unfence-peer.sh[24433]: WARNING drbd-fencing could not determine the 
master id of drbd resource drbd-serv-mail
kernel: [ 9833.615746] block drbd6: helper command: /sbin/drbdadm 
after-resync-target minor-6 exit code 1 (0x100)
kernel: [ 9833.661043] block drbd6: bitmap WRITE of 84 pages took 11 jiffies
kernel: [ 9833.772319] block drbd6: 0 KB (0 bits) marked out-of-sync by 
on disk bit-map.
kernel: [ 9851.333540] block drbd6: magic?? on data m: 0x80816700 c: 
19201 l: 0





More information about the drbd-user mailing list