[DRBD-user] COnfused about shared partitions

Thu Apr 4 20:09:35 CEST 2019

I believe that you are right in your assessment. Unfortunately, after this
I seem to be stuck in a situation in which both nodes are not connected
through DRBD:

In A:

# cat /proc/drbd
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@,
2018-11-03 01:26:55

 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:4 dr:2161 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:336

#  drbdadm status
my-data role:Primary
  disk:UpToDate
  peer connection:Connecting

In B:

# cat /proc/drbd
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@,
2018-11-03 01:26:55

 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:24

# drbdadm status
my-data role:Secondary
  disk:UpToDate
  peer connection:Connecting

I have tried numerous things - including rebooting both VMs where the nodes
live - to no avail.

On Thu, Apr 4, 2019 at 2:20 AM Robert Altnoeder <robert.altnoeder at linbit.com>
wrote:

> On 4/3/19 8:52 PM, JCA wrote:
> > In both nodes I have a file named my-data.res under /etc/drdb.d, with
> > the following contents (identical in both nodes):
> >
> > [...]
> >             net {
> >              allow-two-primaries;
> >             }
>
> Since you have an ext4 on the DRBD device, allow-two-primaries is
> dangerous and unnecessary
>
> > [...]
> >              # pcs cluster stop A
> >
> > [...]
> >
> > Next I started and stopped both A and B (the nodes, not the VMs)
> > [...]
> > In this situation, when I list the contents of /var/lib/my-data in A,
> > I find it to be empty - as expected, for A has been stopped. When I
> > list the contents of /var/lib/my-data in B, however, what I see is
> > files f1 and f2, not f3 and f4.
>
> Most probably, you have created a split brain situation by starting and
> stopping resources or nodes in an incorrect order, e.g.
> standby node A
> standby node B
> online node A
> online node B
>
> What happens in this case is:
> 1. DRBD on node A stops replicating
> 2. DRBD on node B goes into the Primary role and makes changes to the data
> 3. DRBD on node B stops replicating
> 4. DRBD on node A starts, goes into the Primary role and makes changes
> to the data
>     (without having resynchronized the changes from node B, because node
> B's DRBD is not online)
> 5. DRBD on node B starts and cannot resynchronize anymore, because the
> datasets on node A and node B have diverged
>     (aka a "split brain")
>
> Nodes must either be stopped only if the DRBD resources have stopped on
> both nodes in an "UpToDate" disk state, or they must be stopped and
> started in the correct order, which is:
> standby node A
> standby node B
> online node B
> online node A
> - or -
> standby node B
> standby node A
> online node A
> online node B
>
> There are also move/migrate commands to move resource, which lets you
> avoid stopping the replicating/Secondary side of the DRBD resource.
>
> br,
> Robert
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190404/ccf81f48/attachment.htm>