[DRBD-user] PingAck did not arrive in time.

Yannis Milios yannis.milios at gmail.com
Wed May 23 14:20:55 CEST 2018


Two things:

- I would use drbd8 instead of drbd9 for a 2 node setup.
- I would first test with 1 nic instead of 2.

On Wed, May 23, 2018 at 11:01 AM, Dirk Bonenkamp - ProActive <
dirk at proactive.nl> wrote:

> Hi List,
>
> I'm struggling with a new DRBD9 setup. It's a simple Master/Slave setup.
> I'm running Ubuntu 16.04 LTS with the DRBD9 packages from the Launchpad
> PPA.
>
> I'm running some DRBD8 systems in production for quite some years, so I
> have some experience. This setup is very similar, the only major
> difference is that this is DRBD9 and I use LUKS encrypted partitions as
> backend.
>
> I keep running into this 'PingAck did not arrive in time.' error, which
> points to network issues if I am correct (see complete log snippet
> below). This error occurs when I try to reattach the secondary node
> after a reboot. Initial sync works fine.
>
> The servers are interconnected with 2 10Gb NICs. I had bonding & jumbo
> frames configured, but deactivated all this, to no avail. I've also
> stripped the DRBD configuration to the bare minimum (see below).
>
> I've tested the connection with iperf and some other tools and it seems
> just fine.
>
> Could somebody point me in the right direction?
>
> Thank you in advance, regards,
>
> Dirk Bonenkamp
>
> syslog messages:
>
> May 23 11:31:56 data2 kernel: [  704.111755] drbd: loading out-of-tree
> module taints kernel.
> May 23 11:31:56 data2 kernel: [  704.112290] drbd: module verification
> failed: signature and/or required key missing - tainting kernel
> May 23 11:31:56 data2 kernel: [  704.127677] drbd: initialized. Version:
> 9.0.14-1 (api:2/proto:86-113)
> May 23 11:31:56 data2 kernel: [  704.127680] drbd: GIT-hash:
> 62f906cf44ef02a30ce0c148fec223b40c51c533 build by root at data2, 2018-05-23
> 09:19:54
> May 23 11:31:56 data2 kernel: [  704.127683] drbd: registered as block
> device major 147
> May 23 11:31:56 data2 kernel: [  704.153565] drbd r0: Starting worker
> thread (from drbdsetup [4495])
> May 23 11:31:56 data2 kernel: [  704.183031] drbd r0/0 drbd0: disk(
> Diskless -> Attaching )
> May 23 11:31:56 data2 kernel: [  704.183066] drbd r0/0 drbd0: Maximum
> number of peer devices = 1
> May 23 11:31:56 data2 kernel: [  704.183293] drbd r0: Method to ensure
> write ordering: flush
> May 23 11:31:56 data2 kernel: [  704.183308] drbd r0/0 drbd0:
> drbd_bm_resize called with capacity == 273437203064
> May 23 11:31:58 data2 kernel: [  706.508228] drbd r0/0 drbd0: resync
> bitmap: bits=34179650383 words=534057038 pages=1043081
> May 23 11:31:58 data2 kernel: [  706.508234] drbd r0/0 drbd0: size = 127
> TB (136718601532 KB)
> May 23 11:31:58 data2 kernel: [  706.508236] drbd r0/0 drbd0: size = 127
> TB (136718601532 KB)
> May 23 11:32:10 data2 kernel: [  717.890420] drbd r0/0 drbd0: recounting
> of set bits took additional 1256ms
> May 23 11:32:10 data2 kernel: [  717.890435] drbd r0/0 drbd0: disk(
> Attaching -> Outdated )
> May 23 11:32:10 data2 kernel: [  717.890439] drbd r0/0 drbd0: attached
> to current UUID: 244DD61D2781DF44
> May 23 11:32:10 data2 kernel: [  717.918473] drbd r0 data1: Starting
> sender thread (from drbdsetup [4544])
> May 23 11:32:10 data2 kernel: [  717.922534] drbd r0 data1: conn(
> StandAlone -> Unconnected )
> May 23 11:32:10 data2 kernel: [  717.922820] drbd r0 data1: Starting
> receiver thread (from drbd_w_r0 [4498])
> May 23 11:32:10 data2 kernel: [  717.922973] drbd r0 data1: conn(
> Unconnected -> Connecting )
> May 23 11:32:10 data2 kernel: [  718.421219] drbd r0 data1: Handshake to
> peer 1 successful: Agreed network protocol version 113
> May 23 11:32:10 data2 kernel: [  718.421229] drbd r0 data1: Feature
> flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
> WRITE_ZEROES.
> May 23 11:32:10 data2 kernel: [  718.421259] drbd r0 data1: Starting
> ack_recv thread (from drbd_r_r0 [4550])
> May 23 11:32:10 data2 kernel: [  718.424095] drbd r0: Preparing
> cluster-wide state change 1205605755 (0->1 499/146)
> May 23 11:32:10 data2 kernel: [  718.437172] drbd r0: State change
> 1205605755: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
> May 23 11:32:10 data2 kernel: [  718.437185] drbd r0: Aborting
> cluster-wide state change 1205605755 (12ms) rv = -22
> May 23 11:32:12 data2 kernel: [  719.896223] drbd r0: Preparing
> cluster-wide state change 445952355 (0->1 499/146)
> May 23 11:32:12 data2 kernel: [  719.896498] drbd r0: State change
> 445952355: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
> May 23 11:32:12 data2 kernel: [  719.896508] drbd r0: Committing
> cluster-wide state change 445952355 (0ms)
> May 23 11:32:12 data2 kernel: [  719.896541] drbd r0 data1: conn(
> Connecting -> Connected ) peer( Unknown -> Primary )
> May 23 11:32:12 data2 kernel: [  719.912186] drbd r0/0 drbd0 data1:
> drbd_sync_handshake:
> May 23 11:32:12 data2 kernel: [  719.912198] drbd r0/0 drbd0 data1: self
> 244DD61D2781DF44:0000000000000000:0000000000000000:0000000000000000
> bits:52035 flags:20
> May 23 11:32:12 data2 kernel: [  719.912207] drbd r0/0 drbd0 data1: peer
> E38BE51FE782EAE0:244DD61D2781DF44:934CAB8662DF0410:E555BDC58E528356
> bits:53162 flags:20
> May 23 11:32:12 data2 kernel: [  719.912214] drbd r0/0 drbd0 data1:
> uuid_compare()=-2 by rule 50
> May 23 11:32:12 data2 kernel: [  719.912248] drbd r0/0 drbd0 data1:
> pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
> May 23 11:32:32 data2 kernel: [  740.397026] drbd r0 data1: PingAck did
> not arrive in time.
> May 23 11:32:32 data2 kernel: [  740.397121] drbd r0 data1: conn(
> Connected -> NetworkFailure ) peer( Primary -> Unknown )
> May 23 11:32:32 data2 kernel: [  740.397131] drbd r0/0 drbd0 data1:
> pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )
> May 23 11:32:32 data2 kernel: [  740.397176] drbd r0 data1: ack_receiver
> terminated
> May 23 11:32:32 data2 kernel: [  740.397182] drbd r0 data1: Terminating
> ack_recv thread
> May 23 11:32:32 data2 kernel: [  740.458608] drbd r0 data1: Connection
> closed
> May 23 11:32:32 data2 kernel: [  740.458650] drbd r0 data1: conn(
> NetworkFailure -> Unconnected )
> May 23 11:32:32 data2 kernel: [  740.458688] drbd r0 data1: Restarting
> receiver thread
> May 23 11:32:32 data2 kernel: [  740.458723] drbd r0 data1: conn(
> Unconnected -> Connecting )
>
> resources:
>
> resource r0 {
>         on data1 {
>                 device    /dev/drbd0;
>                 disk      /dev/mapper/mapper_secure;
>                 address   172.16.11.21:7789;
>                 meta-disk internal;
>         }
>         on data2 {
>                 device    /dev/drbd0;
>                 disk      /dev/mapper/mapper_secure;
>                 address   172.16.11.22:7789;
>                 meta-disk internal;
>         }
> }
>
> drbd configuration:
>
> global {
>         usage-count yes;
> }
>
> common {
>         #handlers {
>         #        fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
>         #        after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh
> ";
>         #}
>         #disk {
>         #        on-io-error detach;
>         #       disk-barrier no;
>         #       disk-flushes no;
>         #       al-extents 3833;
>         #        c-plan-ahead 7;
>         #        c-fill-target 2M;
>         #        c-min-rate 80M;
>         #        c-max-rate 720M;
>         #}
>         net {
>                 protocol C;
>                 #fencing resource-only;
>                 #cram-hmac-alg sha1;
>                 #verify-alg sha1;
>                 #shared-secret 1e69dc721fd2e65368ae3ba1e5929979;
>                 #after-sb-0pri disconnect;
>                 #after-sb-1pri disconnect;
>                 #after-sb-2pri disconnect;
>                 #max-buffers    8000;
>                 #max-epoch-size 8000;
>                 #sndbuf-size 0;
>                 #rcvbuf-size 2048k;
>         }
> }
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180523/a7754d1c/attachment-0001.htm>


More information about the drbd-user mailing list