[DRBD-user] drbd issue?
Lars Ellenberg
lars.ellenberg at linbit.com
Thu Aug 30 10:46:33 CEST 2018
On Wed, Aug 29, 2018 at 11:33:26AM +0000, Nicolas wrote:
> Hello
>
> Sorry for the misunderstanding of utils version.
>
> I'm using the kernel : 4.9.88-1+deb9u1 (4.9.0-6-amd64 debian).
> And the module version v8.4.7.
> srcversion: 0904DF2CCF7283ACE07D07A
Not that I think it has anything to do with this particular issue,
but I'd suggest you upgrade to 8.4.11 anyways.
> For example when a node says:
>
> [Tue Aug 28 14:32:38 2018] drbd resource10: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
> [Tue Aug 28 14:32:38 2018] drbd resource10: ack_receiver terminated
> [Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_a_resource
> [Tue Aug 28 14:32:38 2018] drbd resource10: Connection closed
> [Tue Aug 28 14:32:38 2018] drbd resource10: conn( Disconnecting -> StandAlone )
> [Tue Aug 28 14:32:38 2018] drbd resource10: receiver terminated
> [Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_r_resource
> [Tue Aug 28 14:32:38 2018] block drbd10: disk( UpToDate -> Failed )
> [Tue Aug 28 14:32:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> [Tue Aug 28 14:32:38 2018] block drbd10: disk( Failed -> Diskless )
> [Tue Aug 28 14:32:38 2018] drbd resource10: Terminating drbd_w_resource
> [Tue Aug 28 14:32:40 2018] drbd resource10: Starting worker thread (from drbdsetup-84 [10222])
Okay. So this is "someone or something" doing a "drbdadm down ; drbdadm up"
> The second says:
>
> [Tue Aug 28 14:35:33 2018] br0: port 8(tap6) entered disabled state
> [Tue Aug 28 14:35:33 2018] device tap6 left promiscuous mode
Uhm, time stamps do not match the excerpt above.
> [Tue Aug 28 14:35:33 2018] br0: port 8(tap6) entered disabled state
> [Tue Aug 28 14:35:37 2018] drbd resource10: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
> [Tue Aug 28 14:35:37 2018] drbd resource10: ack_receiver terminated
> [Tue Aug 28 14:35:37 2018] drbd resource10: Terminating drbd_a_resource
> [Tue Aug 28 14:35:37 2018] block drbd10: new current UUID 629F1036CD6CA2AF:0748EE11C429D3B5:FDAEFCD2E8D9890B:FDADFCD2E8D9890B
> [Tue Aug 28 14:35:37 2018] drbd resource10: Connection closed
> [Tue Aug 28 14:35:37 2018] drbd resource10: conn( TearDown -> Unconnected )
> [Tue Aug 28 14:35:37 2018] drbd resource10: receiver terminated
> [Tue Aug 28 14:35:37 2018] drbd resource10: Restarting receiver thread
> [Tue Aug 28 14:35:37 2018] drbd resource10: receiver (re)started
> [Tue Aug 28 14:35:37 2018] drbd resource10: conn( Unconnected -> WFConnection )
This is "peer node disconnected for some reason".
> [Tue Aug 28 14:35:38 2018] block drbd10: role( Primary -> Secondary )
> [Tue Aug 28 14:35:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> [Tue Aug 28 14:35:38 2018] drbd resource10: conn( WFConnection -> Disconnecting )
> [Tue Aug 28 14:35:38 2018] drbd resource10: Discarding network configuration.
> [Tue Aug 28 14:35:38 2018] drbd resource10: Connection closed
> [Tue Aug 28 14:35:38 2018] drbd resource10: conn( Disconnecting -> StandAlone )
> [Tue Aug 28 14:35:38 2018] drbd resource10: receiver terminated
> [Tue Aug 28 14:35:38 2018] drbd resource10: Terminating drbd_r_resource
> [Tue Aug 28 14:35:38 2018] block drbd10: disk( UpToDate -> Failed )
> [Tue Aug 28 14:35:38 2018] block drbd10: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> [Tue Aug 28 14:35:38 2018] block drbd10: disk( Failed -> Diskless )
> [Tue Aug 28 14:35:38 2018] drbd resource10: Terminating drbd_w_resource
And again, this is a "drbdadm down ; drbdadm up"
> And it seems for this example the second node was the origin of this.
> This night I got another error, saying network failure, but I'm sure there was no network issue:
>
> First node:
>
> [Wed Aug 29 01:39:48 2018] drbd resource0: meta connection shut down by peer.
> [Wed Aug 29 01:39:48 2018] drbd resource0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
...
peer node shut down the connection,
and as a result this node goes through a state called NetworkFailure,
then all the motions,
then reconnects,
and syncs up.
> Second node:
>
> [Wed Aug 29 01:42:48 2018] drbd resource0: PingAck did not arrive in time.
Again, time stamps do not match up.
But there is your reason for this incident: "PingAck did not arrive in time".
Find out why, or simply increase the ping ack timeout.
> -------- Message transféré -------
> De: "Lars Ellenberg" <lars.ellenberg at linbit.com (mailto:lars.ellenberg at linbit.com?to=%22Lars%20Ellenberg%22%20<lars.ellenberg at linbit.com>)>
> À: drbd-user at lists.linbit.com (mailto:drbd-user at lists.linbit.com)
> Envoyé: 29 août 2018 12:09
> Objet: Re: [DRBD-user] drbd issue?
>
> On Tue, Aug 28, 2018 at 02:43:47PM +0000, Nicolas wrote: Hi
>
> I'm using some servers on debian with ganeti and drbd.
>
> Since I've upgraded them to debian 9, and drbd 8.9.10-2 (from debian repo).
> "drbd 8.9.10" is the *utils* version
> (drbdadm, drbdsetup, drbdmeta, various scripts ...)
>
> drbd utils version is meanwhile at 9.5.0, btw. And no, that has not
> much to do with what DRBD kernel module driver version you are using,
> since we ship the "unified utils" for both "drbd 8" and "drbd 9",
> which started years ago already, the utils version is decoupled from
> the module versions.
>
> What kernel version,
> and what DRBD module version?
>
> Maybe you want to make sure you use the latest 8.4 version (8.4.11
> currently), and not whatever "shipts with the debian kernel"?
> I got a lot of issue with my drbd resources, I got randomly on my dmesg some resources disconnected:
>
> today for example:
>
> [Tue Aug 28 14:32:38 2018] drbd resource10: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
> Well, what does the other node say?
> Hit some timeouts?
> Some strangeness with the new NIC drivers?
> A bug in the "shipped with the debian kernel" DRBD version?
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed
More information about the drbd-user
mailing list