<div dir="ltr"><div>Hi all,</div><div><br></div><div>several times an hour occurs the following error. I'm using DRBD-Utils 8.3.11 with a drbd 8.3.13 kernel modul (yes, these are the versions in the current ubuntu repository) on an Ubuntu-Server 12.04 LTS (3.5.0-40-generic), the two-node-cluster is connected from NIC to NIC. I never had any network problems with DRBD, so I hope anybody can help me.</div>
<div><br></div><div>I even tried it without the peer-to-peer connection with a normal network connection, but same problem. </div><div><br></div><div><br></div><div>Here are my configs and the "PingAck did not arrive in time." Error from syslog.</div>
<div><br></div><div><br></div><div>Syslog:</div><div><br></div><div>Sep 23 14:29:30 server2 kernel: [433433.205076] block drbd0: PingAck did not arrive in time.</div><div>Sep 23 14:29:30 server2 kernel: [433433.224170] block drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )</div>
<div>Sep 23 14:29:30 server2 kernel: [433433.224271] block drbd0: new current UUID 18159322F4E2DA03:64D5FAA4F99072B3:86AA2CBE74BC037D:86A92CBE74BC037D</div><div>Sep 23 14:29:30 server2 kernel: [433433.224382] block drbd0: asender terminated</div>
<div>Sep 23 14:29:30 server2 kernel: [433433.224390] block drbd0: Terminating drbd0_asender</div><div>Sep 23 14:29:30 server2 kernel: [433433.224500] block drbd0: Connection closed</div><div>Sep 23 14:29:30 server2 kernel: [433433.224512] block drbd0: conn( NetworkFailure -> Unconnected )</div>
<div>Sep 23 14:29:30 server2 kernel: [433433.224520] block drbd0: receiver terminated</div><div>Sep 23 14:29:30 server2 kernel: [433433.224524] block drbd0: Restarting drbd0_receiver</div><div>Sep 23 14:29:30 server2 kernel: [433433.224528] block drbd0: receiver (re)started</div>
<div>Sep 23 14:29:30 server2 kernel: [433433.224534] block drbd0: conn( Unconnected -> WFConnection )</div><div>Sep 23 14:29:31 server2 kernel: [433434.407261] block drbd0: Handshake successful: Agreed network protocol version 96</div>
<div>Sep 23 14:29:31 server2 kernel: [433434.407592] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC</div><div>Sep 23 14:29:31 server2 kernel: [433434.407628] block drbd0: conn( WFConnection -> WFReportParams )</div>
<div>Sep 23 14:29:31 server2 kernel: [433434.407637] block drbd0: Starting asender thread (from drbd0_receiver [16816])</div><div>Sep 23 14:29:31 server2 kernel: [433434.407837] block drbd0: data-integrity-alg: <not-used></div>
<div>Sep 23 14:29:31 server2 kernel: [433434.407868] block drbd0: drbd_sync_handshake:</div><div>Sep 23 14:29:31 server2 kernel: [433434.407875] block drbd0: self 18159322F4E2DA03:64D5FAA4F99072B3:86AA2CBE74BC037D:86A92CBE74BC037D bits:16 flags:0</div>
<div>Sep 23 14:29:31 server2 kernel: [433434.407880] block drbd0: peer 64D5FAA4F99072B2:0000000000000000:86AA2CBE74BC037C:86A92CBE74BC037D bits:0 flags:0</div><div>Sep 23 14:29:31 server2 kernel: [433434.407884] block drbd0: uuid_compare()=1 by rule 70</div>
<div>Sep 23 14:29:31 server2 kernel: [433434.407894] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )</div><div>Sep 23 14:29:31 server2 kernel: [433434.973757] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0</div>
<div>Sep 23 14:29:31 server2 kernel: [433434.976878] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)</div><div>Sep 23 14:29:31 server2 kernel: [433434.976891] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )</div>
<div>Sep 23 14:29:31 server2 kernel: [433434.976902] block drbd0: Began resync as SyncSource (will sync 64 KB [16 bits set]).</div><div>Sep 23 14:29:31 server2 kernel: [433434.976917] block drbd0: updated sync UUID 18159322F4E2DA03:64D6FAA4F99072B3:64D5FAA4F99072B3:86AA2CBE74BC037D</div>
<div>Sep 23 14:29:31 server2 kernel: [433434.991168] block drbd0: Resync done (total 1 sec; paused 0 sec; 64 K/sec)</div><div>Sep 23 14:29:31 server2 kernel: [433434.991178] block drbd0: updated UUIDs 18159322F4E2DA03:0000000000000000:64D6FAA4F99072B3:64D5FAA4F99072B3</div>
<div>Sep 23 14:29:31 server2 kernel: [433434.991187] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )</div><div>Sep 23 14:29:31 server2 kernel: [433435.072506] block drbd0: bitmap WRITE of 8049 pages took 20 jiffies</div>
<div>Sep 23 14:29:31 server2 kernel: [433435.072540] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div>DRBD - global_common.conf:</div>
<div><br></div><div>global {</div><div> usage-count no;</div><div> # minor-count dialog-refresh disable-ip-verification</div><div>}</div><div><br></div><div>common {</div><div> protocol C;</div><div><br>
</div><div> handlers {</div><div> pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";</div>
<div> pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";</div><div> local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";</div>
<div> # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";</div><div> # split-brain "/usr/lib/drbd/notify-split-brain.sh root";</div><div> # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";</div>
<div> # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";</div><div> # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;</div><div>
}</div><div><br></div><div> startup {</div><div> # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb</div><div> }</div><div><br></div><div> disk {</div><div> # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes</div>
<div> # no-disk-drain no-md-flushes max-bio-bvecs</div><div> }</div><div><br></div><div> net {</div><div> # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers</div>
<div> # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret</div><div> # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork</div><div> }</div>
<div><br></div><div> syncer {</div><div> # rate after al-extents use-rle cpu-mask verify-alg csums-alg</div><div> rate 110M;</div><div> }</div><div>}</div><div><br></div><div><br>
</div><div><br></div><div><br></div><div>Ressource Config:</div><div><br></div><div>resource r0_opt {</div><div> protocol C;</div><div> startup {</div><div> wfc-timeout 15;</div><div> degr-wfc-timeout 60;</div>
<div> }</div><div> net {</div><div> cram-hmac-alg sha1;</div><div> shared-secret "xxxxxxxxxxx";</div><div> }</div><div> on server1 {</div><div> device /dev/drbd0;</div>
<div> disk /dev/vg01/opt-lv;</div><div> address <a href="http://172.22.122.1:7789">172.22.122.1:7789</a>;</div><div> meta-disk internal;</div><div> }</div><div> on server2 {</div>
<div> device /dev/drbd0;</div><div> disk /dev/vg01/opt-lv;</div><div> address <a href="http://172.22.122.2:7789">172.22.122.2:7789</a>;</div><div> meta-disk internal;</div>
<div> }</div><div>}</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div style>Thx,</div><div style><br></div><div style>-Chris</div></div>