[DRBD-user] allow-two-primaries problem

Fri May 24 19:10:20 CEST 2019

Hello!

My DRBD cluster is back to full power, all three nodes up and running.

I am still able to promote more than one node to primary, although I 
have no such thing configured. I was even able to promote all three 
nodes:

Before:
node0:~ # drbdstat md8
md8 node-id:0 role:Primary suspended:no
     write-ordering:flush
   volume:0 minor:8 disk:UpToDate quorum:yes
       size:314255868 read:18959644 written:2048 al-writes:1 bm-writes:0 
upper-pending:0 lower-pending:0 al-suspended:no blocked:no
   node1 node-id:1 connection:Connected role:Secondary congested:no 
ap-in-flight:0 rs-in-flight:18446744073709088768
     volume:0 replication:Established peer-disk:UpToDate 
resync-suspended:no
         received:316872 sent:231964 out-of-sync:0 pending:0 unacked:0
   node2 node-id:2 connection:Connected role:Secondary congested:no 
ap-in-flight:0 rs-in-flight:0
     volume:0 replication:Established peer-disk:UpToDate 
resync-suspended:no
         received:21909399 sent:2048 out-of-sync:0 pending:0 unacked:0

After "drbdadm primary md8" on other two nodes:
node0:~ # drbdstat md8
md8 node-id:0 role:Primary suspended:no
     write-ordering:flush
   volume:0 minor:8 disk:UpToDate quorum:yes
       size:314255868 read:18959644 written:2048 al-writes:1 bm-writes:0 
upper-pending:0 lower-pending:0 al-suspended:no blocked:no
   node1 node-id:1 connection:Connected role:Primary congested:no 
ap-in-flight:0 rs-in-flight:18446744073709088768
     volume:0 replication:Established peer-disk:UpToDate 
resync-suspended:no
         received:316872 sent:231964 out-of-sync:0 pending:0 unacked:0
   node2 node-id:2 connection:Connected role:Primary congested:no 
ap-in-flight:0 rs-in-flight:0
     volume:0 replication:Established peer-disk:UpToDate 
resync-suspended:no
         received:21909399 sent:2048 out-of-sync:0 pending:0 unacked:0

I don't think this is expected nor right behavior.
I have XFS filesystem there and it would just hate being mounted on more 
that one node at a time.

Have I stumbled across a bug?

---
Srdačan pozdrav/Best regards/Freundliche Grüße/Cordialement,
Siniša Bandin

On 13.05.2019 12:20, Sinisa wrote:
> Thanks for a quick reply. Here are the details of my configuration:
> 
>  # uname -a
> Linux node0 5.1.1-1.g65f0348-default #1 SMP Sat May 11 17:16:51 UTC
> 2019 (65f0348) x86_64 x86_64 x86_64 GNU/Linux
> (latest stable kernel from opensuse repository)
> 
> 
> before - node2 is Primary, node0 is Secondary (node1 is currently down
> for repair - can this be the reason why remaing two don't work as
> expected?)
> 
> node0 # drbdstat md5
> md5 node-id:0 role:Secondary suspended:no
>     write-ordering:flush
>   volume:0 minor:5 disk:UpToDate quorum:yes
>       size:345725692 read:86972 written:8623059 al-writes:2332
> bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
>   node1 node-id:1 connection:Connecting role:Unknown congested:no
> ap-in-flight:0 rs-in-flight:0
>     volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
>         received:0 sent:0 out-of-sync:318772620 pending:0 unacked:0
>   node2 node-id:2 connection:Connected role:Primary congested:no
> ap-in-flight:0 rs-in-flight:0
>     volume:0 replication:Established peer-disk:UpToDate 
> resync-suspended:no
>         received:8623059 sent:82940 out-of-sync:0 pending:0 unacked:0
> 
> nore2# drbdstat md5
> md5 node-id:2 role:Primary suspended:no
>     write-ordering:flush
>   volume:0 minor:5 disk:UpToDate quorum:yes
>       size:345725692 read:681961962 written:665922273 al-writes:19990
> bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
>   node0 node-id:0 connection:Connected role:Secondary congested:no
> ap-in-flight:0 rs-in-flight:18446744073709544568
>     volume:0 replication:Established peer-disk:UpToDate 
> resync-suspended:no
>         received:668957943 sent:328667978 out-of-sync:0 pending:0 
> unacked:0
>   node1 node-id:1 connection:Connecting role:Unknown congested:no
> ap-in-flight:0 rs-in-flight:0
>     volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
>         received:0 sent:0 out-of-sync:345725692 pending:0 unacked:0
> 
> 
> after
>     node0# drbdadm primary md5
> both node0 and node2 are Primary
> 
> 
> 
> node0# drbdstat md5
> md5 node-id:0 role:Primary suspended:no
>     write-ordering:flush
>   volume:0 minor:5 disk:UpToDate quorum:yes
>       size:345725692 read:86972 written:8633116 al-writes:2332
> bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
>   node1 node-id:1 connection:Connecting role:Unknown congested:no
> ap-in-flight:0 rs-in-flight:0
>     volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
>         received:0 sent:0 out-of-sync:318772620 pending:0 unacked:0
>   node2 node-id:2 connection:Connected role:Primary congested:no
> ap-in-flight:0 rs-in-flight:0
>     volume:0 replication:Established peer-disk:UpToDate 
> resync-suspended:no
>         received:8633116 sent:82940 out-of-sync:0 pending:0 unacked:0
> 
> 
> 
> dmesg output:
> on node0:
> [Mon May 13 12:09:53 2019] drbd md5 node2: Split-brain handler
> configured, rely on it.
> [Mon May 13 12:09:53 2019] drbd md5: Preparing cluster-wide state
> change 3262357452 (0->-1 3/1)
> [Mon May 13 12:09:53 2019] drbd md5: State change 3262357452:
> primary_nodes=5, weak_nodes=FFFFFFFFFFFFFFFA
> [Mon May 13 12:09:53 2019] drbd md5 node2: Split-brain handler
> configured, rely on it.
> [Mon May 13 12:09:53 2019] drbd md5: Committing cluster-wide state
> change 3262357452 (0ms)
> [Mon May 13 12:09:53 2019] drbd md5 node2: Split-brain handler
> configured, rely on it.
> [Mon May 13 12:09:53 2019] drbd md5: role( Secondary -> Primary )
> 
> 
> on node2:
> [Mon May 13 12:10:04 2019] drbd md5 node0: Preparing remote state
> change 3262357452
> [Mon May 13 12:10:04 2019] drbd md5 node0: Split-brain handler
> configured, rely on it.
> [Mon May 13 12:10:04 2019] drbd md5 node0: Committing remote state
> change 3262357452 (primary_nodes=5)
> [Mon May 13 12:10:04 2019] drbd md5 node0: Split-brain handler
> configured, rely on it.
> [Mon May 13 12:10:04 2019] drbd md5 node0: peer( Secondary -> Primary )
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Srdačan pozdrav / Best regards / Freundliche Grüße / Cordialement,
> Siniša Bandin
> 
> On 5/13/19 11:04 AM, Robert Altnoeder wrote:
>> On 5/11/19 3:30 PM, Siniša Bandin wrote:
>>> I have a 3-node DRBD9 (9.0.17) cluster.
>>> 
>>> The problem is: although the option "allow-two-primaries" is NOT set,
>>> I am able to set two nodes as primary, even worse, to mount XFS file
>>> system on both.
>> Tested right now, 3 nodes, DRBD 9.0.17-1
>> (b9abab2dd27313922797d026542b399870bfd13e), Linux 4.8.11 amd64
>> I cannot reproduce the problem.
>> 
>> What is the exact status of the resources on those three nodes?
>> 
>> br,
>> Robert
>> 
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user