[DRBD-user] DRBD 9: 3-node mirror error (Low.dev. smaller than requested DRBD-dev. size.)

Fri Jul 26 12:47:38 CEST 2019

It seems that some of the size values have changed compared to the
original problem report:

On 7/25/19 5:59 PM, Paul Clements wrote:
>  ~]# size_gb=71680
> [root at sios0 ~]# lvcreate -n $LV -L ${size_gb} $VG
>   Logical volume "my_lv" created.
> [root at sios0 ~]# lvs
>   LV    VG    Attr       LSize  Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>   my_lv my_vg -wi-a----- 70.00g

So that's one 71680 MiB LVM logical volume, exactly 70 GiB.

However, your original problem report stated:

> # cat /proc/partitions | grep dm-0
>  253        0   73433088 dm-0

which is 71712 MiB, or approximately 70.03 GiB

Then, in your last transcript, we see:
> [1295819.009396] drbd r0/0 drbd0: Current (diskless) capacity
> 146861616, cannot attach smaller (146791608) disk

The "current (diskless) capacity" would be 146861616 sectors, each 512
bytes, which equals 75193147392 bytes.
That's 73430808 kiB, which is approximately 71709.77 MiB, or 70.02 GiB,
and that is exactly the net size of a DRBD with a gross size of 71712
MiB and only one peer slot (2 nodes in the cluster) instead of 2 peer
slots (3 nodes in the cluster).

My guess what's happening here is this:

Either the local node or some remote node that it can still successfully
connect to while initializing the local DRBD was either not stopped, or
the DRBD kernel module on it did not stop the resource successfully.
Therefore, some node still had size information from the original 71712
MiB volume with only one peer slot (a 2 node cluster configuration).

Then you created a new 71680 MiB volume and initialized it with meta
data with 2 peer slots (a 3 node cluster configuration), and when you
started the local DRBD, it either had not been stopped/uninitialized
completely (less likely) or it had succeeded to connect to some remote
node where DRBD had not been stopped/uninitialized completely (more
likely) before it attached its local disk, and that's where it got size
information about the old net size of the disk from, which is still the
~70.02 GiB of the 2 node cluster setup.

When it tried to attach the local disk, which is only 146791608 sectors
= 73395804 kiB = ~71675.59 MiB (that is consistent with meta data for a
71680 MiB gross-size 3 node cluster setup), it could not attach that
disk, because it was smaller than the ~71709.77 MiB peer disk it is
supposed to replicate.

I suggest to repeat the entire procedure, that is, stop DRBD on all
nodes, destroy backend storage LVs, recreate backend storage LVs, check
the size of the newly created LVs, zero-out the LVs, create meta data,
and then start DRBD, all while monitoring the DRBD state of all three
nodes with the drbdmon utility.
I would also recommend to unload and reload the DRBD kernel module on
all three nodes to make sure it has actually stopped all resources, or
even safer, reboot all three nodes.

Another recommendation would be to upgrade to the most recent DRBD and
drbd-utils versions.

I have tried to reproduce the problem with the exact versions and sizes
you are using, but attaching the disk worked normally in my test.

br,
Robert