[DRBD-user] PingAck did not arrive in time.
Oleksiy Evin
o.evin at onefc.com
Sun Jun 16 16:37:08 CEST 2019
Hi All,
Can anyone help me on the "PingAck did not arrive in time." repeated error while initial bitmap synchronization? First time it happened after I updated our cluster with the latest centos and drbd updates. I'm using the basic drbd configuration on 527TB LVM volume, replicated on 2 nodes with cross-over 100Gbps connection. The same connection is used for Pacemaker without any problems. I don't see any network adapter errors in logs, no reconnects or packets drop when the error happens for drbd. I've also tried another adapter with 10Gbps direct cable connection and got the same error.
# rpm -q centos-release
centos-release-7-6.1810.2.el7.centos.x86_64
# yum list installed | grep drbd
drbd90-utils.x86_64 9.6.0-1.el7.elrepo @elrepo kmod-drbd90.x86_64 9.0.16-1.el7_6.elrepo @elrepo
# ifconfig
ens2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.16.1.1 netmask 255.255.255.255 broadcast 172.16.1.1 ether b8:83:03:67:3f:d4 txqueuelen 1000 (Ethernet) RX packets 63547 bytes 11147564 (10.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 265307 bytes 33045583 (31.5 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eno8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.20.1.1 netmask 255.255.0.0 broadcast 172.20.255.255 ether 20:67:7c:1c:42:c6 txqueuelen 1000 (Ethernet) RX packets 484 bytes 49086 (47.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 504 bytes 56974 (55.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 116 memory 0xe3000000-e37fffff
# drbdadm dump all
# /etc/drbd.confglobal { usage-count no;}
common { options { auto-promote yes; } net { protocol C; }}
# resource r0 on sgpplhan01: not ignored, not stacked# defined at /etc/drbd.d/r0.res:1resource r0 { volume 0 { device /dev/drbd0 minor 0; disk /dev/storage/data; meta-disk internal; } on sgpplhan01 { node-id 0; address ipv4 172.16.1.1:7788; } on sgpplhan02 { node-id 1; address ipv4 172.16.2.1:7788; } net { after-sb-0pri discard-zero-changes; after-sb-1pri consensus; after-sb-2pri disconnect; }}
# dmesg | grep drbd
[37259.335235] drbd r0/0 drbd0 sgpplhan02: drbd_sync_handshake:[37259.335245] drbd r0/0 drbd0 sgpplhan02: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:141608532581 flags:24[37259.335254] drbd r0/0 drbd0 sgpplhan02: peer B7DA5A657F09CD92:45B54292B9CBC0CF:0000000000000000:0000000000000000 bits:141608532581 flags:20[37259.335260] drbd r0/0 drbd0 sgpplhan02: uuid_compare()=-3 by rule 20[37259.335265] drbd r0/0 drbd0 sgpplhan02: Writing the whole bitmap, full sync required after drbd_sync_handshake.[37265.754528] drbd r0/0 drbd0 sgpplhan02: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )[37265.754546] drbd r0/0 drbd0: Resumed AL updates[37279.780140] drbd r0 sgpplhan02: PingAck did not arrive in time.[37279.781303] drbd r0 sgpplhan02: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )[37279.781313] drbd r0/0 drbd0 sgpplhan02: pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )[37279.781371] drbd r0 sgpplhan02: ack_receiver terminated[37279.781376] drbd r0 sgpplhan02: Terminating ack_recv thread[37279.833051] drbd r0 sgpplhan02: Connection closed[37279.833069] drbd r0 sgpplhan02: conn( NetworkFailure -> Unconnected )[37279.833086] drbd r0 sgpplhan02: Restarting receiver thread[37279.833098] drbd r0 sgpplhan02: conn( Unconnected -> Connecting )[37308.171618] drbd r0 sgpplhan02: Handshake to peer 1 successful: Agreed network protocol version 114[37308.171628] drbd r0 sgpplhan02: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.[37308.171666] drbd r0 sgpplhan02: Starting ack_recv thread (from drbd_r_r0 [28699])[37308.217846] drbd r0: Preparing cluster-wide state change 686534516 (0->1 499/146)[37308.218242] drbd r0: State change 686534516: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC[37308.218253] drbd r0: Committing cluster-wide state change 686534516 (0ms)[37308.218296] drbd r0 sgpplhan02: conn( Connecting -> Connected ) peer( Unknown -> Primary )[37308.222753] drbd r0/0 drbd0 sgpplhan02: drbd_sync_handshake:[37308.222763] drbd r0/0 drbd0 sgpplhan02: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:141608532581 flags:124[37308.222771] drbd r0/0 drbd0 sgpplhan02: peer B7DA5A657F09CD92:45B54292B9CBC0CF:0000000000000000:0000000000000000 bits:141608532581 flags:120[37308.222777] drbd r0/0 drbd0 sgpplhan02: uuid_compare()=-3 by rule 20[37308.222782] drbd r0/0 drbd0 sgpplhan02: Writing the whole bitmap, full sync required after drbd_sync_handshake.[37314.890717] drbd r0/0 drbd0 sgpplhan02: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )[37328.669598] drbd r0 sgpplhan02: PingAck did not arrive in time.[37328.670759] drbd r0 sgpplhan02: conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )[37328.670770] drbd r0/0 drbd0 sgpplhan02: pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )[37328.670823] drbd r0 sgpplhan02: ack_receiver terminated[37328.670828] drbd r0 sgpplhan02: Terminating ack_recv thread[37328.718096] drbd r0 sgpplhan02: Connection closed[37328.718112] drbd r0 sgpplhan02: conn( NetworkFailure -> Unconnected )[37328.718127] drbd r0 sgpplhan02: Restarting receiver thread[37328.718138] drbd r0 sgpplhan02: conn( Unconnected -> Connecting )[37351.755553] drbd r0 sgpplhan02: conn( Connecting -> Disconnecting )[37351.794081] drbd r0 sgpplhan02: Connection closed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190616/f26dd340/attachment.htm>
More information about the drbd-user
mailing list