[DRBD-user] PingAck did not arrive in time.

Nelson Hicks nelsonh at socket.net
Wed May 23 21:22:27 CEST 2018


Is there any chance this could be an MTU mismatch between the two nodes? 
If you use ping with varying packet sizes from one node to the other, do 
they stop working above a specific size? Does ifconfig report the same 
MTU size for the interface on both nodes?

Examples:

ifconfig | grep MTU

ping -s 500 <other_ip>

ping -s 1400 <other_ip>

ping -s 1472 <other_ip>

ping -s 2000 <other_ip>

Thanks,

- Nelson Hicks




On 05/23/2018 02:07 PM, Dirk Bonenkamp - ProActive wrote:
> Hi,
>
> Thank you for your reply.
>
> I am / was under the impression that DRBD9 is the new and improved 
> DRBD, so I figured to use this version. But this is not the case? 
> Could somebody enlighten me a bit?
>
> I already have disabled all bonding and other fancy network stuff, so 
> I'm using  1 nic currently. This doesn't solve anything unfortunately.
>
> Kind regards,
>
> Dirk
>
> On 23-05-18 14:20, Yannis Milios wrote:
>> Two things:
>>
>> - I would use drbd8 instead of drbd9 for a 2 node setup.
>> - I would first test with 1 nic instead of 2.
>>
>> On Wed, May 23, 2018 at 11:01 AM, Dirk Bonenkamp - ProActive 
>> <dirk at proactive.nl <mailto:dirk at proactive.nl>> wrote:
>>
>>     Hi List,
>>
>>     I'm struggling with a new DRBD9 setup. It's a simple Master/Slave
>>     setup.
>>     I'm running Ubuntu 16.04 LTS with the DRBD9 packages from the
>>     Launchpad PPA.
>>
>>     I'm running some DRBD8 systems in production for quite some
>>     years, so I
>>     have some experience. This setup is very similar, the only major
>>     difference is that this is DRBD9 and I use LUKS encrypted
>>     partitions as
>>     backend.
>>
>>     I keep running into this 'PingAck did not arrive in time.' error,
>>     which
>>     points to network issues if I am correct (see complete log snippet
>>     below). This error occurs when I try to reattach the secondary node
>>     after a reboot. Initial sync works fine.
>>
>>     The servers are interconnected with 2 10Gb NICs. I had bonding &
>>     jumbo
>>     frames configured, but deactivated all this, to no avail. I've also
>>     stripped the DRBD configuration to the bare minimum (see below).
>>
>>     I've tested the connection with iperf and some other tools and it
>>     seems
>>     just fine.
>>
>>     Could somebody point me in the right direction?
>>
>>     Thank you in advance, regards,
>>
>>     Dirk Bonenkamp
>>
>>     syslog messages:
>>
>>     May 23 11:31:56 data2 kernel: [  704.111755] drbd: loading
>>     out-of-tree
>>     module taints kernel.
>>     May 23 11:31:56 data2 kernel: [  704.112290] drbd: module
>>     verification
>>     failed: signature and/or required key missing - tainting kernel
>>     May 23 11:31:56 data2 kernel: [  704.127677] drbd: initialized.
>>     Version:
>>     9.0.14-1 (api:2/proto:86-113)
>>     May 23 11:31:56 data2 kernel: [  704.127680] drbd: GIT-hash:
>>     62f906cf44ef02a30ce0c148fec223b40c51c533 build by root at data2,
>>     2018-05-23
>>     09:19:54
>>     May 23 11:31:56 data2 kernel: [  704.127683] drbd: registered as
>>     block
>>     device major 147
>>     May 23 11:31:56 data2 kernel: [  704.153565] drbd r0: Starting worker
>>     thread (from drbdsetup [4495])
>>     May 23 11:31:56 data2 kernel: [  704.183031] drbd r0/0 drbd0: disk(
>>     Diskless -> Attaching )
>>     May 23 11:31:56 data2 kernel: [  704.183066] drbd r0/0 drbd0: Maximum
>>     number of peer devices = 1
>>     May 23 11:31:56 data2 kernel: [  704.183293] drbd r0: Method to
>>     ensure
>>     write ordering: flush
>>     May 23 11:31:56 data2 kernel: [  704.183308] drbd r0/0 drbd0:
>>     drbd_bm_resize called with capacity == 273437203064
>>     May 23 11:31:58 data2 kernel: [  706.508228] drbd r0/0 drbd0: resync
>>     bitmap: bits=34179650383 words=534057038 pages=1043081
>>     May 23 11:31:58 data2 kernel: [  706.508234] drbd r0/0 drbd0:
>>     size = 127
>>     TB (136718601532 KB)
>>     May 23 11:31:58 data2 kernel: [  706.508236] drbd r0/0 drbd0:
>>     size = 127
>>     TB (136718601532 KB)
>>     May 23 11:32:10 data2 kernel: [  717.890420] drbd r0/0 drbd0:
>>     recounting
>>     of set bits took additional 1256ms
>>     May 23 11:32:10 data2 kernel: [  717.890435] drbd r0/0 drbd0: disk(
>>     Attaching -> Outdated )
>>     May 23 11:32:10 data2 kernel: [  717.890439] drbd r0/0 drbd0:
>>     attached
>>     to current UUID: 244DD61D2781DF44
>>     May 23 11:32:10 data2 kernel: [  717.918473] drbd r0 data1: Starting
>>     sender thread (from drbdsetup [4544])
>>     May 23 11:32:10 data2 kernel: [  717.922534] drbd r0 data1: conn(
>>     StandAlone -> Unconnected )
>>     May 23 11:32:10 data2 kernel: [  717.922820] drbd r0 data1: Starting
>>     receiver thread (from drbd_w_r0 [4498])
>>     May 23 11:32:10 data2 kernel: [  717.922973] drbd r0 data1: conn(
>>     Unconnected -> Connecting )
>>     May 23 11:32:10 data2 kernel: [  718.421219] drbd r0 data1:
>>     Handshake to
>>     peer 1 successful: Agreed network protocol version 113
>>     May 23 11:32:10 data2 kernel: [  718.421229] drbd r0 data1: Feature
>>     flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>>     WRITE_ZEROES.
>>     May 23 11:32:10 data2 kernel: [  718.421259] drbd r0 data1: Starting
>>     ack_recv thread (from drbd_r_r0 [4550])
>>     May 23 11:32:10 data2 kernel: [  718.424095] drbd r0: Preparing
>>     cluster-wide state change 1205605755 (0->1 499/146)
>>     May 23 11:32:10 data2 kernel: [  718.437172] drbd r0: State change
>>     1205605755: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
>>     May 23 11:32:10 data2 kernel: [  718.437185] drbd r0: Aborting
>>     cluster-wide state change 1205605755 (12ms) rv = -22
>>     May 23 11:32:12 data2 kernel: [  719.896223] drbd r0: Preparing
>>     cluster-wide state change 445952355 (0->1 499/146)
>>     May 23 11:32:12 data2 kernel: [  719.896498] drbd r0: State change
>>     445952355: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
>>     May 23 11:32:12 data2 kernel: [  719.896508] drbd r0: Committing
>>     cluster-wide state change 445952355 (0ms)
>>     May 23 11:32:12 data2 kernel: [  719.896541] drbd r0 data1: conn(
>>     Connecting -> Connected ) peer( Unknown -> Primary )
>>     May 23 11:32:12 data2 kernel: [  719.912186] drbd r0/0 drbd0 data1:
>>     drbd_sync_handshake:
>>     May 23 11:32:12 data2 kernel: [  719.912198] drbd r0/0 drbd0
>>     data1: self
>>     244DD61D2781DF44:0000000000000000:0000000000000000:0000000000000000
>>     bits:52035 flags:20
>>     May 23 11:32:12 data2 kernel: [  719.912207] drbd r0/0 drbd0
>>     data1: peer
>>     E38BE51FE782EAE0:244DD61D2781DF44:934CAB8662DF0410:E555BDC58E528356
>>     bits:53162 flags:20
>>     May 23 11:32:12 data2 kernel: [  719.912214] drbd r0/0 drbd0 data1:
>>     uuid_compare()=-2 by rule 50
>>     May 23 11:32:12 data2 kernel: [  719.912248] drbd r0/0 drbd0 data1:
>>     pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
>>     May 23 11:32:32 data2 kernel: [  740.397026] drbd r0 data1:
>>     PingAck did
>>     not arrive in time.
>>     May 23 11:32:32 data2 kernel: [  740.397121] drbd r0 data1: conn(
>>     Connected -> NetworkFailure ) peer( Primary -> Unknown )
>>     May 23 11:32:32 data2 kernel: [  740.397131] drbd r0/0 drbd0 data1:
>>     pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )
>>     May 23 11:32:32 data2 kernel: [  740.397176] drbd r0 data1:
>>     ack_receiver
>>     terminated
>>     May 23 11:32:32 data2 kernel: [  740.397182] drbd r0 data1:
>>     Terminating
>>     ack_recv thread
>>     May 23 11:32:32 data2 kernel: [  740.458608] drbd r0 data1:
>>     Connection
>>     closed
>>     May 23 11:32:32 data2 kernel: [  740.458650] drbd r0 data1: conn(
>>     NetworkFailure -> Unconnected )
>>     May 23 11:32:32 data2 kernel: [  740.458688] drbd r0 data1:
>>     Restarting
>>     receiver thread
>>     May 23 11:32:32 data2 kernel: [  740.458723] drbd r0 data1: conn(
>>     Unconnected -> Connecting )
>>
>>     resources:
>>
>>     resource r0 {
>>             on data1 {
>>                     device    /dev/drbd0;
>>                     disk      /dev/mapper/mapper_secure;
>>                     address 172.16.11.21:7789 <http://172.16.11.21:7789>;
>>                     meta-disk internal;
>>             }
>>             on data2 {
>>                     device    /dev/drbd0;
>>                     disk      /dev/mapper/mapper_secure;
>>                     address 172.16.11.22:7789 <http://172.16.11.22:7789>;
>>                     meta-disk internal;
>>             }
>>     }
>>
>>     drbd configuration:
>>
>>     global {
>>             usage-count yes;
>>     }
>>
>>     common {
>>             #handlers {
>>             #        fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh
>>     <http://crm-fence-peer.9.sh>";
>>             #        after-resync-target
>>     "/usr/lib/drbd/crm-unfence-peer.9.sh <http://crm-unfence-peer.9.sh>";
>>             #}
>>             #disk {
>>             #        on-io-error detach;
>>             #       disk-barrier no;
>>             #       disk-flushes no;
>>             #       al-extents 3833;
>>             #        c-plan-ahead 7;
>>             #        c-fill-target 2M;
>>             #        c-min-rate 80M;
>>             #        c-max-rate 720M;
>>             #}
>>             net {
>>                     protocol C;
>>                     #fencing resource-only;
>>                     #cram-hmac-alg sha1;
>>                     #verify-alg sha1;
>>                     #shared-secret 1e69dc721fd2e65368ae3ba1e5929979;
>>                     #after-sb-0pri disconnect;
>>                     #after-sb-1pri disconnect;
>>                     #after-sb-2pri disconnect;
>>                     #max-buffers    8000;
>>                     #max-epoch-size 8000;
>>                     #sndbuf-size 0;
>>                     #rcvbuf-size 2048k;
>>             }
>>     }
>>
>>
>>
>>     _______________________________________________
>>     drbd-user mailing list
>>     drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>>     http://lists.linbit.com/mailman/listinfo/drbd-user
>>     <http://lists.linbit.com/mailman/listinfo/drbd-user>
>>
>>
>
> -- 
> ProActive Software
> Dirk Bonenkamp
> CTO 		<https://www.proactive-software.com>
> Phone: +31 (0)23 54 222 99
> Mobile: +31 (0)6 250 787 93 	Richard Holkade 9
> 2033 PZ Haarlem
> LinkedIn <http://linkd.in/1V6egnk> Facebook 
> <http://bit.ly/FBProActive> YouTube <http://bit.ly/1Mc23L9> 
> www.proactive.nl <https://www.proactive.nl>
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list