Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> I'm running DRBD 8.3.13 on Debian Wheezy, Linux 3.2.20 and > every now and then my DRBD resources spontaneously switch from > cs:Connected to cs:WFConnection or the various syncing states and back > (according to "watch cat /proc/drbd"). > > I've sometimes seen "broken pipe" or even "protocol error"(!?) flashing > by briefly. No luck debugging this so far. I've tried changing network cards, switching between bonding modes, reverting back to regular ethX (instead of bonding), various MTU and txqueuelen values, using resource-only-fencing (corosync) and not. Nothing has helped so far - this connection unstability just seems to come and go. Any better debugging ideas? Or maybe this is not a network issue at all? Excerpt from DRBD configuration: net { timeout 20; max-epoch-size 8192; max-buffers 128k; connect-int 2; ping-int 2; sndbuf-size 10M; rcvbuf-size 10M; ko-count 5; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; ping-timeout 2; } syncer { rate 100M; al-extents 3389; csums-alg crc32c; verify-alg crc32c; } Here's a syslog snippet demonstrating one whole cycle of this behavior: kernel: [ 9827.966027] block drbd6: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) kernel: [ 9828.199039] block drbd6: helper command: /sbin/drbdadm after-resync-target minor-6 crm-unfence-peer.sh[24132]: invoked for drbd-serv-mail crm-unfence-peer.sh[24132]: WARNING drbd-fencing could not determine the master id of drbd resource drbd-serv-mail kernel: [ 9828.238394] block drbd6: helper command: /sbin/drbdadm after-resync-target minor-6 exit code 1 (0x100) kernel: [ 9828.298906] block drbd6: bitmap WRITE of 83 pages took 15 jiffies kernel: [ 9828.503024] block drbd6: 0 KB (0 bits) marked out-of-sync by on disk bit-map. kernel: [ 9831.788745] block drbd6: magic?? on data m: 0xa0816800 c: 5120 l: 0 kernel: [ 9831.788790] block drbd6: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) kernel: [ 9831.789573] block drbd6: asender terminated kernel: [ 9831.789576] block drbd6: Terminating drbd6_asender kernel: [ 9832.041526] block drbd6: Connection closed kernel: [ 9832.041531] block drbd6: conn( ProtocolError -> Unconnected ) kernel: [ 9832.041535] block drbd6: receiver terminated kernel: [ 9832.041537] block drbd6: Restarting drbd6_receiver kernel: [ 9832.041539] block drbd6: receiver (re)started kernel: [ 9832.041542] block drbd6: conn( Unconnected -> WFConnection ) kernel: [ 9832.457266] block drbd6: Handshake successful: Agreed network protocol version 96 kernel: [ 9832.457276] block drbd6: conn( WFConnection -> WFReportParams ) kernel: [ 9832.457357] block drbd6: Starting asender thread (from drbd6_receiver [29943]) kernel: [ 9832.457733] block drbd6: data-integrity-alg: <not-used> kernel: [ 9832.457745] block drbd6: drbd_sync_handshake: kernel: [ 9832.457748] block drbd6: self E8E3BDC352C4C580:0000000000000000:71C7A5DE96C51226:71C6A5DE96C51227 bits:0 flags:0 kernel: [ 9832.457751] block drbd6: peer E915DF859DCA76C9:E8E3BDC352C4C581:71C7A5DE96C51227:71C6A5DE96C51227 bits:12 flags:0 kernel: [ 9832.457754] block drbd6: uuid_compare()=-1 by rule 50 kernel: [ 9832.457758] block drbd6: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) kernel: [ 9832.883300] block drbd6: conn( WFBitMapT -> WFSyncUUID ) kernel: [ 9832.987097] block drbd6: updated sync uuid E8E4BDC352C4C580:0000000000000000:71C7A5DE96C51226:71C6A5DE96C51227 kernel: [ 9833.141291] block drbd6: helper command: /sbin/drbdadm before-resync-target minor-6 kernel: [ 9833.158129] block drbd6: helper command: /sbin/drbdadm before-resync-target minor-6 exit code 0 (0x0) kernel: [ 9833.158135] block drbd6: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) kernel: [ 9833.158141] block drbd6: Began resync as SyncTarget (will sync 52 KB [13 bits set]). kernel: [ 9833.415551] block drbd6: Resync done (total 1 sec; paused 0 sec; 52 K/sec) kernel: [ 9833.415554] block drbd6: 23 % had equal checksums, eliminated: 12K; transferred 40K total 52K kernel: [ 9833.415558] block drbd6: updated UUIDs E915DF859DCA76C8:0000000000000000:E8E4BDC352C4C580:E8E3BDC352C4C581 kernel: [ 9833.415563] block drbd6: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) kernel: [ 9833.575311] block drbd6: helper command: /sbin/drbdadm after-resync-target minor-6 crm-unfence-peer.sh[24433]: invoked for drbd-serv-mail crm-unfence-peer.sh[24433]: WARNING drbd-fencing could not determine the master id of drbd resource drbd-serv-mail kernel: [ 9833.615746] block drbd6: helper command: /sbin/drbdadm after-resync-target minor-6 exit code 1 (0x100) kernel: [ 9833.661043] block drbd6: bitmap WRITE of 84 pages took 11 jiffies kernel: [ 9833.772319] block drbd6: 0 KB (0 bits) marked out-of-sync by on disk bit-map. kernel: [ 9851.333540] block drbd6: magic?? on data m: 0x80816700 c: 19201 l: 0