[DRBD-user] PingAck did not arrive in time.

Dirk Bonenkamp - ProActive dirk at proactive.nl
Wed May 23 12:01:53 CEST 2018


Hi List,

I'm struggling with a new DRBD9 setup. It's a simple Master/Slave setup.
I'm running Ubuntu 16.04 LTS with the DRBD9 packages from the Launchpad PPA.

I'm running some DRBD8 systems in production for quite some years, so I
have some experience. This setup is very similar, the only major
difference is that this is DRBD9 and I use LUKS encrypted partitions as
backend.

I keep running into this 'PingAck did not arrive in time.' error, which
points to network issues if I am correct (see complete log snippet
below). This error occurs when I try to reattach the secondary node
after a reboot. Initial sync works fine.

The servers are interconnected with 2 10Gb NICs. I had bonding & jumbo
frames configured, but deactivated all this, to no avail. I've also
stripped the DRBD configuration to the bare minimum (see below).

I've tested the connection with iperf and some other tools and it seems
just fine.

Could somebody point me in the right direction?

Thank you in advance, regards,

Dirk Bonenkamp

syslog messages:

May 23 11:31:56 data2 kernel: [  704.111755] drbd: loading out-of-tree
module taints kernel.
May 23 11:31:56 data2 kernel: [  704.112290] drbd: module verification
failed: signature and/or required key missing - tainting kernel
May 23 11:31:56 data2 kernel: [  704.127677] drbd: initialized. Version:
9.0.14-1 (api:2/proto:86-113)
May 23 11:31:56 data2 kernel: [  704.127680] drbd: GIT-hash:
62f906cf44ef02a30ce0c148fec223b40c51c533 build by root at data2, 2018-05-23
09:19:54
May 23 11:31:56 data2 kernel: [  704.127683] drbd: registered as block
device major 147
May 23 11:31:56 data2 kernel: [  704.153565] drbd r0: Starting worker
thread (from drbdsetup [4495])
May 23 11:31:56 data2 kernel: [  704.183031] drbd r0/0 drbd0: disk(
Diskless -> Attaching )
May 23 11:31:56 data2 kernel: [  704.183066] drbd r0/0 drbd0: Maximum
number of peer devices = 1
May 23 11:31:56 data2 kernel: [  704.183293] drbd r0: Method to ensure
write ordering: flush
May 23 11:31:56 data2 kernel: [  704.183308] drbd r0/0 drbd0:
drbd_bm_resize called with capacity == 273437203064
May 23 11:31:58 data2 kernel: [  706.508228] drbd r0/0 drbd0: resync
bitmap: bits=34179650383 words=534057038 pages=1043081
May 23 11:31:58 data2 kernel: [  706.508234] drbd r0/0 drbd0: size = 127
TB (136718601532 KB)
May 23 11:31:58 data2 kernel: [  706.508236] drbd r0/0 drbd0: size = 127
TB (136718601532 KB)
May 23 11:32:10 data2 kernel: [  717.890420] drbd r0/0 drbd0: recounting
of set bits took additional 1256ms
May 23 11:32:10 data2 kernel: [  717.890435] drbd r0/0 drbd0: disk(
Attaching -> Outdated )
May 23 11:32:10 data2 kernel: [  717.890439] drbd r0/0 drbd0: attached
to current UUID: 244DD61D2781DF44
May 23 11:32:10 data2 kernel: [  717.918473] drbd r0 data1: Starting
sender thread (from drbdsetup [4544])
May 23 11:32:10 data2 kernel: [  717.922534] drbd r0 data1: conn(
StandAlone -> Unconnected )
May 23 11:32:10 data2 kernel: [  717.922820] drbd r0 data1: Starting
receiver thread (from drbd_w_r0 [4498])
May 23 11:32:10 data2 kernel: [  717.922973] drbd r0 data1: conn(
Unconnected -> Connecting )
May 23 11:32:10 data2 kernel: [  718.421219] drbd r0 data1: Handshake to
peer 1 successful: Agreed network protocol version 113
May 23 11:32:10 data2 kernel: [  718.421229] drbd r0 data1: Feature
flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
WRITE_ZEROES.
May 23 11:32:10 data2 kernel: [  718.421259] drbd r0 data1: Starting
ack_recv thread (from drbd_r_r0 [4550])
May 23 11:32:10 data2 kernel: [  718.424095] drbd r0: Preparing
cluster-wide state change 1205605755 (0->1 499/146)
May 23 11:32:10 data2 kernel: [  718.437172] drbd r0: State change
1205605755: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
May 23 11:32:10 data2 kernel: [  718.437185] drbd r0: Aborting
cluster-wide state change 1205605755 (12ms) rv = -22
May 23 11:32:12 data2 kernel: [  719.896223] drbd r0: Preparing
cluster-wide state change 445952355 (0->1 499/146)
May 23 11:32:12 data2 kernel: [  719.896498] drbd r0: State change
445952355: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
May 23 11:32:12 data2 kernel: [  719.896508] drbd r0: Committing
cluster-wide state change 445952355 (0ms)
May 23 11:32:12 data2 kernel: [  719.896541] drbd r0 data1: conn(
Connecting -> Connected ) peer( Unknown -> Primary )
May 23 11:32:12 data2 kernel: [  719.912186] drbd r0/0 drbd0 data1:
drbd_sync_handshake:
May 23 11:32:12 data2 kernel: [  719.912198] drbd r0/0 drbd0 data1: self
244DD61D2781DF44:0000000000000000:0000000000000000:0000000000000000
bits:52035 flags:20
May 23 11:32:12 data2 kernel: [  719.912207] drbd r0/0 drbd0 data1: peer
E38BE51FE782EAE0:244DD61D2781DF44:934CAB8662DF0410:E555BDC58E528356
bits:53162 flags:20
May 23 11:32:12 data2 kernel: [  719.912214] drbd r0/0 drbd0 data1:
uuid_compare()=-2 by rule 50
May 23 11:32:12 data2 kernel: [  719.912248] drbd r0/0 drbd0 data1:
pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
May 23 11:32:32 data2 kernel: [  740.397026] drbd r0 data1: PingAck did
not arrive in time.
May 23 11:32:32 data2 kernel: [  740.397121] drbd r0 data1: conn(
Connected -> NetworkFailure ) peer( Primary -> Unknown )
May 23 11:32:32 data2 kernel: [  740.397131] drbd r0/0 drbd0 data1:
pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )
May 23 11:32:32 data2 kernel: [  740.397176] drbd r0 data1: ack_receiver
terminated
May 23 11:32:32 data2 kernel: [  740.397182] drbd r0 data1: Terminating
ack_recv thread
May 23 11:32:32 data2 kernel: [  740.458608] drbd r0 data1: Connection
closed
May 23 11:32:32 data2 kernel: [  740.458650] drbd r0 data1: conn(
NetworkFailure -> Unconnected )
May 23 11:32:32 data2 kernel: [  740.458688] drbd r0 data1: Restarting
receiver thread
May 23 11:32:32 data2 kernel: [  740.458723] drbd r0 data1: conn(
Unconnected -> Connecting )

resources:

resource r0 {
        on data1 {
                device    /dev/drbd0;
                disk      /dev/mapper/mapper_secure;
                address   172.16.11.21:7789;
                meta-disk internal;
        }
        on data2 {
                device    /dev/drbd0;
                disk      /dev/mapper/mapper_secure;
                address   172.16.11.22:7789;
                meta-disk internal;
        }
}

drbd configuration:

global {
        usage-count yes;
}

common {
        #handlers {
        #        fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
        #        after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
        #}
        #disk {
        #        on-io-error detach;
        #       disk-barrier no;
        #       disk-flushes no;
        #       al-extents 3833;
        #        c-plan-ahead 7;
        #        c-fill-target 2M;
        #        c-min-rate 80M;
        #        c-max-rate 720M;
        #}
        net {
                protocol C;
                #fencing resource-only;
                #cram-hmac-alg sha1;
                #verify-alg sha1;
                #shared-secret 1e69dc721fd2e65368ae3ba1e5929979;
                #after-sb-0pri disconnect;
                #after-sb-1pri disconnect;
                #after-sb-2pri disconnect;
                #max-buffers    8000;
                #max-epoch-size 8000;
                #sndbuf-size 0;
                #rcvbuf-size 2048k;
        }
}





More information about the drbd-user mailing list