Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, several times an hour occurs the following error. I'm using DRBD-Utils 8.3.11 with a drbd 8.3.13 kernel modul (yes, these are the versions in the current ubuntu repository) on an Ubuntu-Server 12.04 LTS (3.5.0-40-generic), the two-node-cluster is connected from NIC to NIC. I never had any network problems with DRBD, so I hope anybody can help me. I even tried it without the peer-to-peer connection with a normal network connection, but same problem. Here are my configs and the "PingAck did not arrive in time." Error from syslog. Syslog: Sep 23 14:29:30 server2 kernel: [433433.205076] block drbd0: PingAck did not arrive in time. Sep 23 14:29:30 server2 kernel: [433433.224170] block drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Sep 23 14:29:30 server2 kernel: [433433.224271] block drbd0: new current UUID 18159322F4E2DA03:64D5FAA4F99072B3:86AA2CBE74BC037D:86A92CBE74BC037D Sep 23 14:29:30 server2 kernel: [433433.224382] block drbd0: asender terminated Sep 23 14:29:30 server2 kernel: [433433.224390] block drbd0: Terminating drbd0_asender Sep 23 14:29:30 server2 kernel: [433433.224500] block drbd0: Connection closed Sep 23 14:29:30 server2 kernel: [433433.224512] block drbd0: conn( NetworkFailure -> Unconnected ) Sep 23 14:29:30 server2 kernel: [433433.224520] block drbd0: receiver terminated Sep 23 14:29:30 server2 kernel: [433433.224524] block drbd0: Restarting drbd0_receiver Sep 23 14:29:30 server2 kernel: [433433.224528] block drbd0: receiver (re)started Sep 23 14:29:30 server2 kernel: [433433.224534] block drbd0: conn( Unconnected -> WFConnection ) Sep 23 14:29:31 server2 kernel: [433434.407261] block drbd0: Handshake successful: Agreed network protocol version 96 Sep 23 14:29:31 server2 kernel: [433434.407592] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Sep 23 14:29:31 server2 kernel: [433434.407628] block drbd0: conn( WFConnection -> WFReportParams ) Sep 23 14:29:31 server2 kernel: [433434.407637] block drbd0: Starting asender thread (from drbd0_receiver [16816]) Sep 23 14:29:31 server2 kernel: [433434.407837] block drbd0: data-integrity-alg: <not-used> Sep 23 14:29:31 server2 kernel: [433434.407868] block drbd0: drbd_sync_handshake: Sep 23 14:29:31 server2 kernel: [433434.407875] block drbd0: self 18159322F4E2DA03:64D5FAA4F99072B3:86AA2CBE74BC037D:86A92CBE74BC037D bits:16 flags:0 Sep 23 14:29:31 server2 kernel: [433434.407880] block drbd0: peer 64D5FAA4F99072B2:0000000000000000:86AA2CBE74BC037C:86A92CBE74BC037D bits:0 flags:0 Sep 23 14:29:31 server2 kernel: [433434.407884] block drbd0: uuid_compare()=1 by rule 70 Sep 23 14:29:31 server2 kernel: [433434.407894] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) Sep 23 14:29:31 server2 kernel: [433434.973757] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 Sep 23 14:29:31 server2 kernel: [433434.976878] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0) Sep 23 14:29:31 server2 kernel: [433434.976891] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent ) Sep 23 14:29:31 server2 kernel: [433434.976902] block drbd0: Began resync as SyncSource (will sync 64 KB [16 bits set]). Sep 23 14:29:31 server2 kernel: [433434.976917] block drbd0: updated sync UUID 18159322F4E2DA03:64D6FAA4F99072B3:64D5FAA4F99072B3:86AA2CBE74BC037D Sep 23 14:29:31 server2 kernel: [433434.991168] block drbd0: Resync done (total 1 sec; paused 0 sec; 64 K/sec) Sep 23 14:29:31 server2 kernel: [433434.991178] block drbd0: updated UUIDs 18159322F4E2DA03:0000000000000000:64D6FAA4F99072B3:64D5FAA4F99072B3 Sep 23 14:29:31 server2 kernel: [433434.991187] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Sep 23 14:29:31 server2 kernel: [433435.072506] block drbd0: bitmap WRITE of 8049 pages took 20 jiffies Sep 23 14:29:31 server2 kernel: [433435.072540] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. DRBD - global_common.conf: global { usage-count no; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb } disk { # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs } net { # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork } syncer { # rate after al-extents use-rle cpu-mask verify-alg csums-alg rate 110M; } } Ressource Config: resource r0_opt { protocol C; startup { wfc-timeout 15; degr-wfc-timeout 60; } net { cram-hmac-alg sha1; shared-secret "xxxxxxxxxxx"; } on server1 { device /dev/drbd0; disk /dev/vg01/opt-lv; address 172.22.122.1:7789; meta-disk internal; } on server2 { device /dev/drbd0; disk /dev/vg01/opt-lv; address 172.22.122.2:7789; meta-disk internal; } } Thx, -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130923/71dbafe0/attachment.htm>