[DRBD-user] PingAck did not arrive in time.
Dirk Bonenkamp - ProActive
dirk at proactive.nl
Tue Jun 19 08:46:51 CEST 2018
For those who are interested, or those who might stumble on this thread
through Google:
After fiddling around a bit longer without any reliable results, I've
decided to install DRBD8 instead of DRBD9, as Yannis suggested earlier.
Worked straight away, and after a bit of tuning it works awesome.
Kind regards,
D.
On 24-05-18 07:54, Dirk Bonenkamp - ProActive wrote:
> Hello,
>
> Thank you for your suggestion. The MTU is 1500 on both nodes. I had it
> at 9000, but reverted everything to 'normal' to debug this problem.
> Pinging as in your example works fine.
>
> Cheers,
>
> Dirk
>
> On 23-05-18 21:22, Nelson Hicks wrote:
>> Is there any chance this could be an MTU mismatch between the two
>> nodes? If you use ping with varying packet sizes from one node to the
>> other, do they stop working above a specific size? Does ifconfig
>> report the same MTU size for the interface on both nodes?
>>
>> Examples:
>>
>> ifconfig | grep MTU
>>
>> ping -s 500 <other_ip>
>>
>> ping -s 1400 <other_ip>
>>
>> ping -s 1472 <other_ip>
>>
>> ping -s 2000 <other_ip>
>>
>> Thanks,
>>
>> - Nelson Hicks
>>
>>
>>
>>
>> On 05/23/2018 02:07 PM, Dirk Bonenkamp - ProActive wrote:
>>> Hi,
>>>
>>> Thank you for your reply.
>>>
>>> I am / was under the impression that DRBD9 is the new and improved
>>> DRBD, so I figured to use this version. But this is not the case?
>>> Could somebody enlighten me a bit?
>>>
>>> I already have disabled all bonding and other fancy network stuff,
>>> so I'm using 1 nic currently. This doesn't solve anything
>>> unfortunately.
>>>
>>> Kind regards,
>>>
>>> Dirk
>>>
>>> On 23-05-18 14:20, Yannis Milios wrote:
>>>> Two things:
>>>>
>>>> - I would use drbd8 instead of drbd9 for a 2 node setup.
>>>> - I would first test with 1 nic instead of 2.
>>>>
>>>> On Wed, May 23, 2018 at 11:01 AM, Dirk Bonenkamp - ProActive
>>>> <dirk at proactive.nl <mailto:dirk at proactive.nl>> wrote:
>>>>
>>>> Hi List,
>>>>
>>>> I'm struggling with a new DRBD9 setup. It's a simple Master/Slave
>>>> setup.
>>>> I'm running Ubuntu 16.04 LTS with the DRBD9 packages from the
>>>> Launchpad PPA.
>>>>
>>>> I'm running some DRBD8 systems in production for quite some
>>>> years, so I
>>>> have some experience. This setup is very similar, the only major
>>>> difference is that this is DRBD9 and I use LUKS encrypted
>>>> partitions as
>>>> backend.
>>>>
>>>> I keep running into this 'PingAck did not arrive in time.' error,
>>>> which
>>>> points to network issues if I am correct (see complete log snippet
>>>> below). This error occurs when I try to reattach the secondary
>>>> node
>>>> after a reboot. Initial sync works fine.
>>>>
>>>> The servers are interconnected with 2 10Gb NICs. I had bonding &
>>>> jumbo
>>>> frames configured, but deactivated all this, to no avail. I've
>>>> also
>>>> stripped the DRBD configuration to the bare minimum (see below).
>>>>
>>>> I've tested the connection with iperf and some other tools and it
>>>> seems
>>>> just fine.
>>>>
>>>> Could somebody point me in the right direction?
>>>>
>>>> Thank you in advance, regards,
>>>>
>>>> Dirk Bonenkamp
>>>>
>>>> syslog messages:
>>>>
>>>> May 23 11:31:56 data2 kernel: [ 704.111755] drbd: loading
>>>> out-of-tree
>>>> module taints kernel.
>>>> May 23 11:31:56 data2 kernel: [ 704.112290] drbd: module
>>>> verification
>>>> failed: signature and/or required key missing - tainting kernel
>>>> May 23 11:31:56 data2 kernel: [ 704.127677] drbd: initialized.
>>>> Version:
>>>> 9.0.14-1 (api:2/proto:86-113)
>>>> May 23 11:31:56 data2 kernel: [ 704.127680] drbd: GIT-hash:
>>>> 62f906cf44ef02a30ce0c148fec223b40c51c533 build by root at data2,
>>>> 2018-05-23
>>>> 09:19:54
>>>> May 23 11:31:56 data2 kernel: [ 704.127683] drbd: registered as
>>>> block
>>>> device major 147
>>>> May 23 11:31:56 data2 kernel: [ 704.153565] drbd r0: Starting
>>>> worker
>>>> thread (from drbdsetup [4495])
>>>> May 23 11:31:56 data2 kernel: [ 704.183031] drbd r0/0 drbd0:
>>>> disk(
>>>> Diskless -> Attaching )
>>>> May 23 11:31:56 data2 kernel: [ 704.183066] drbd r0/0 drbd0:
>>>> Maximum
>>>> number of peer devices = 1
>>>> May 23 11:31:56 data2 kernel: [ 704.183293] drbd r0: Method to
>>>> ensure
>>>> write ordering: flush
>>>> May 23 11:31:56 data2 kernel: [ 704.183308] drbd r0/0 drbd0:
>>>> drbd_bm_resize called with capacity == 273437203064
>>>> May 23 11:31:58 data2 kernel: [ 706.508228] drbd r0/0 drbd0:
>>>> resync
>>>> bitmap: bits=34179650383 words=534057038 pages=1043081
>>>> May 23 11:31:58 data2 kernel: [ 706.508234] drbd r0/0 drbd0:
>>>> size = 127
>>>> TB (136718601532 KB)
>>>> May 23 11:31:58 data2 kernel: [ 706.508236] drbd r0/0 drbd0:
>>>> size = 127
>>>> TB (136718601532 KB)
>>>> May 23 11:32:10 data2 kernel: [ 717.890420] drbd r0/0 drbd0:
>>>> recounting
>>>> of set bits took additional 1256ms
>>>> May 23 11:32:10 data2 kernel: [ 717.890435] drbd r0/0 drbd0:
>>>> disk(
>>>> Attaching -> Outdated )
>>>> May 23 11:32:10 data2 kernel: [ 717.890439] drbd r0/0 drbd0:
>>>> attached
>>>> to current UUID: 244DD61D2781DF44
>>>> May 23 11:32:10 data2 kernel: [ 717.918473] drbd r0 data1:
>>>> Starting
>>>> sender thread (from drbdsetup [4544])
>>>> May 23 11:32:10 data2 kernel: [ 717.922534] drbd r0 data1: conn(
>>>> StandAlone -> Unconnected )
>>>> May 23 11:32:10 data2 kernel: [ 717.922820] drbd r0 data1:
>>>> Starting
>>>> receiver thread (from drbd_w_r0 [4498])
>>>> May 23 11:32:10 data2 kernel: [ 717.922973] drbd r0 data1: conn(
>>>> Unconnected -> Connecting )
>>>> May 23 11:32:10 data2 kernel: [ 718.421219] drbd r0 data1:
>>>> Handshake to
>>>> peer 1 successful: Agreed network protocol version 113
>>>> May 23 11:32:10 data2 kernel: [ 718.421229] drbd r0 data1:
>>>> Feature
>>>> flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
>>>> WRITE_ZEROES.
>>>> May 23 11:32:10 data2 kernel: [ 718.421259] drbd r0 data1:
>>>> Starting
>>>> ack_recv thread (from drbd_r_r0 [4550])
>>>> May 23 11:32:10 data2 kernel: [ 718.424095] drbd r0: Preparing
>>>> cluster-wide state change 1205605755 (0->1 499/146)
>>>> May 23 11:32:10 data2 kernel: [ 718.437172] drbd r0: State change
>>>> 1205605755: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
>>>> May 23 11:32:10 data2 kernel: [ 718.437185] drbd r0: Aborting
>>>> cluster-wide state change 1205605755 (12ms) rv = -22
>>>> May 23 11:32:12 data2 kernel: [ 719.896223] drbd r0: Preparing
>>>> cluster-wide state change 445952355 (0->1 499/146)
>>>> May 23 11:32:12 data2 kernel: [ 719.896498] drbd r0: State change
>>>> 445952355: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
>>>> May 23 11:32:12 data2 kernel: [ 719.896508] drbd r0: Committing
>>>> cluster-wide state change 445952355 (0ms)
>>>> May 23 11:32:12 data2 kernel: [ 719.896541] drbd r0 data1: conn(
>>>> Connecting -> Connected ) peer( Unknown -> Primary )
>>>> May 23 11:32:12 data2 kernel: [ 719.912186] drbd r0/0 drbd0
>>>> data1:
>>>> drbd_sync_handshake:
>>>> May 23 11:32:12 data2 kernel: [ 719.912198] drbd r0/0 drbd0
>>>> data1: self
>>>>
>>>> 244DD61D2781DF44:0000000000000000:0000000000000000:0000000000000000
>>>> bits:52035 flags:20
>>>> May 23 11:32:12 data2 kernel: [ 719.912207] drbd r0/0 drbd0
>>>> data1: peer
>>>>
>>>> E38BE51FE782EAE0:244DD61D2781DF44:934CAB8662DF0410:E555BDC58E528356
>>>> bits:53162 flags:20
>>>> May 23 11:32:12 data2 kernel: [ 719.912214] drbd r0/0 drbd0
>>>> data1:
>>>> uuid_compare()=-2 by rule 50
>>>> May 23 11:32:12 data2 kernel: [ 719.912248] drbd r0/0 drbd0
>>>> data1:
>>>> pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
>>>> May 23 11:32:32 data2 kernel: [ 740.397026] drbd r0 data1:
>>>> PingAck did
>>>> not arrive in time.
>>>> May 23 11:32:32 data2 kernel: [ 740.397121] drbd r0 data1: conn(
>>>> Connected -> NetworkFailure ) peer( Primary -> Unknown )
>>>> May 23 11:32:32 data2 kernel: [ 740.397131] drbd r0/0 drbd0
>>>> data1:
>>>> pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )
>>>> May 23 11:32:32 data2 kernel: [ 740.397176] drbd r0 data1:
>>>> ack_receiver
>>>> terminated
>>>> May 23 11:32:32 data2 kernel: [ 740.397182] drbd r0 data1:
>>>> Terminating
>>>> ack_recv thread
>>>> May 23 11:32:32 data2 kernel: [ 740.458608] drbd r0 data1:
>>>> Connection
>>>> closed
>>>> May 23 11:32:32 data2 kernel: [ 740.458650] drbd r0 data1: conn(
>>>> NetworkFailure -> Unconnected )
>>>> May 23 11:32:32 data2 kernel: [ 740.458688] drbd r0 data1:
>>>> Restarting
>>>> receiver thread
>>>> May 23 11:32:32 data2 kernel: [ 740.458723] drbd r0 data1: conn(
>>>> Unconnected -> Connecting )
>>>>
>>>> resources:
>>>>
>>>> resource r0 {
>>>> on data1 {
>>>> device /dev/drbd0;
>>>> disk /dev/mapper/mapper_secure;
>>>> address 172.16.11.21:7789
>>>> <http://172.16.11.21:7789>;
>>>> meta-disk internal;
>>>> }
>>>> on data2 {
>>>> device /dev/drbd0;
>>>> disk /dev/mapper/mapper_secure;
>>>> address 172.16.11.22:7789
>>>> <http://172.16.11.22:7789>;
>>>> meta-disk internal;
>>>> }
>>>> }
>>>>
>>>> drbd configuration:
>>>>
>>>> global {
>>>> usage-count yes;
>>>> }
>>>>
>>>> common {
>>>> #handlers {
>>>> # fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh
>>>> <http://crm-fence-peer.9.sh>";
>>>> # after-resync-target
>>>> "/usr/lib/drbd/crm-unfence-peer.9.sh
>>>> <http://crm-unfence-peer.9.sh>";
>>>> #}
>>>> #disk {
>>>> # on-io-error detach;
>>>> # disk-barrier no;
>>>> # disk-flushes no;
>>>> # al-extents 3833;
>>>> # c-plan-ahead 7;
>>>> # c-fill-target 2M;
>>>> # c-min-rate 80M;
>>>> # c-max-rate 720M;
>>>> #}
>>>> net {
>>>> protocol C;
>>>> #fencing resource-only;
>>>> #cram-hmac-alg sha1;
>>>> #verify-alg sha1;
>>>> #shared-secret 1e69dc721fd2e65368ae3ba1e5929979;
>>>> #after-sb-0pri disconnect;
>>>> #after-sb-1pri disconnect;
>>>> #after-sb-2pri disconnect;
>>>> #max-buffers 8000;
>>>> #max-epoch-size 8000;
>>>> #sndbuf-size 0;
>>>> #rcvbuf-size 2048k;
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> drbd-user mailing list
>>>> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>> <http://lists.linbit.com/mailman/listinfo/drbd-user>
>>>>
>>>>
>>>
>>> --
>>> ProActive Software
>>> Dirk Bonenkamp
>>> CTO <https://www.proactive-software.com>
>>> Phone: +31 (0)23 54 222 99
>>> Mobile: +31 (0)6 250 787 93 Richard Holkade 9
>>> 2033 PZ Haarlem
>>> LinkedIn <http://linkd.in/1V6egnk> Facebook
>>> <http://bit.ly/FBProActive> YouTube <http://bit.ly/1Mc23L9>
>>> www.proactive.nl <https://www.proactive.nl>
>>>
>>>
>>>
>>> _______________________________________________
>>> drbd-user mailing list
>>> drbd-user at lists.linbit.com
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
> --
> ProActive Software
> Dirk Bonenkamp
> CTO <https://www.proactive-software.com>
> Phone: +31 (0)23 54 222 99
> Mobile: +31 (0)6 250 787 93 Richard Holkade 9
> 2033 PZ Haarlem
> LinkedIn <http://linkd.in/1V6egnk> Facebook
> <http://bit.ly/FBProActive> YouTube <http://bit.ly/1Mc23L9>
> www.proactive.nl <https://www.proactive.nl>
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
--
ProActive Software
Dirk Bonenkamp
CTO <https://www.proactive-software.com>
Phone: +31 (0)23 54 222 99
Mobile: +31 (0)6 250 787 93 Richard Holkade 9
2033 PZ Haarlem
LinkedIn <http://linkd.in/1V6egnk> Facebook
<http://bit.ly/FBProActive> YouTube <http://bit.ly/1Mc23L9>
www.proactive.nl <https://www.proactive.nl>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180619/020c382c/attachment-0001.htm>
More information about the drbd-user
mailing list