[DRBD-user] DRBDmanage (re)initialization

Fri Jun 9 14:24:53 CEST 2017

Le 09/06/2017 à 09:59, Robert Altnoeder a écrit :
> On 06/08/2017 04:14 PM, Julien Escario wrote:
>> Hello,
>> A drbdmanage cluster is actually stuck in this state :
>> .drbdctrl role:Secondary
>>   volume:0 disk:UpToDate
>>   volume:1 disk:UpToDate
>>   vm4 connection:NetworkFailure
>>   vm7 role:Secondary
>>     volume:0 replication:WFBitMapS peer-disk:Inconsistent
>>     volume:1 peer-disk:Outdated
>> [...]
>> Any way to restart this ressource without losing all other ressources ?
> on vm4 and vm7, try 'drbdadm down .drbdctrl' followed by 'drbdadm up
> .drbdctrl'.
> In most cases, it just reconnects and fixes itself.

Many thanks !
BUT, it seems I have a bigger problem on vm7.

First, vm4 and vm5 and secondary on the drbdctrl ressource.

Running 'drbdadm status' on vm7 timeout, as does any drdbsetup command.

In the logs on vm4, I have :
[25578277.437480] drbd .drbdctrl: Preparing cluster-wide state change 3545178393
(0->3 499/146)
[25578277.437809] drbd .drbdctrl: Aborting cluster-wide state change 3545178393
(0ms) rv = -19

Which is probably normal as drbdmanage can't setup primary state.

On vm5, kernel are slitghlty different :
[25574921.080463] drbd .drbdctrl vm7: Ignoring P_TWOPC_ABORT packet 2546904845.

And on vm7, almost the same :
[25396073.307115] drbd .drbdctrl vm4: Rejecting concurrent remote state change
2590742863 because of state change 2272799652
[25396073.307390] drbd .drbdctrl vm4: Ignoring P_TWOPC_ABORT packet 2590742863.

Showing drbdmanage nodes is fine on vm7 :
> # drbdmanage n
> +------------------------------------------------------------------------------------------------------------+
> | Name | Pool Size | Pool Free |                                                                     | State |
> |------------------------------------------------------------------------------------------------------------|
> | vm4  |    921600 |    363755 |                                                                     |    ok |
> | vm5  |    921600 |    329840 |                                                                     |    ok |
> | vm7  |    921600 |    380712 |                                                                     |    ok |
> +------------------------------------------------------------------------------------------------------------+

But timeout on vm4 and vm5, which, as previous, is fine if drbdctrl ressource
can't go primary (correct me if I'm wrong).

So it seems things are screwed up on vm7 but  VMs are backuped each night
successfully.

Any idea to get out of this situation without having to reboot the whole node ?
Being unable to run any drbdsetup command successully makes me wonder that only
a reboot will do the trick.

Thanks !
Julien Escario

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3705 bytes
Desc: Signature cryptographique S/MIME
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170609/8702066c/attachment.bin>