[DRBD-user] create-md fails on zeroed device

Dingwall, James James.Dingwall at ncr.com
Tue Feb 1 16:28:41 CET 2022


Hi,

We have a nightly deployment of our environment which has started failing when it comes to creating drbd metadata.  We are using a shared metadata volume and each resource has a unique slot.

We are using the linbit ppa:

# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 087ee6b4961ca154d76e4211223b03149373bed8\ build\ by\ buildd at lgw01-amd64-002\,\ 2022-01-31\ 07:21:30
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x09001e
DRBD_KERNEL_VERSION=9.0.30
DRBDADM_VERSION_CODE=0x091402
DRBDADM_VERSION=9.20.2

This environment is being used to support an upgrade from drbd8 which is why the module version is 9.0.x.

If I reset drbd with these steps:

# drbdadm down all
# blkdiscard /dev/zvol/zpool/drbdmetadata
# drbdadm adjust all
No valid meta data found
No valid meta data found
No valid meta data found
No valid meta data found
No valid meta data found
# drbd-overview (we've written a compatibility script)
1000:4d995db7-f653-4a94-b8ac-371b05e518f2-1-cfg/0  StandAlone     Secondary/Unknown Diskless/DUnknown
1001:f50e1c7e-6f8f-4cc5-a824-7bc43a40dd25-1-bin/0  StandAlone     Secondary/Unknown Diskless/DUnknown
1002:f50e1c7e-6f8f-4cc5-a824-7bc43a40dd25-1-vol/0  StandAlone     Secondary/Unknown Diskless/DUnknown
1003:f50e1c7e-6f8f-4cc5-a824-7bc43a40dd25-1-cfg/0  StandAlone     Secondary/Unknown Diskless/DUnknown
1004:f50e1c7e-6f8f-4cc5-a824-7bc43a40dd25-1-dat/0  StandAlone     Secondary/Unknown Diskless/DUnknown
# drbdadm create-md 4d995db7-f653-4a94-b8ac-371b05e518f2-1-cfg
initializing activity log
initializing bitmap (800 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
# drbdadm create-md f50e1c7e-6f8f-4cc5-a824-7bc43a40dd25-1-bin
md_offset 134217728
al_offset 134221824
bm_offset 134254592

Found some data

 ==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] no

Operation canceled.

If I skip the first resource I can create-md the remainder without "Found some data".  The underlying volumes for drbd1000 are 150000MB, there are 2 peers so it this would seem to fit well inside the 128MB slot on the meta data disk.  The impression is that there is an out-of-bounds write or the check for existing data is reading from the wrong place (slot 0?).  With a strace of the erroring drbdadm command I can see the third read returns non-zero data.  The offset to that pread64() is 0 which seems unexpected?

[pid 1046078] openat(AT_FDCWD, "/dev/zvol/zpool/drbdmetadata", O_RDWR|O_DIRECT) = 5
[pid 1046078] fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0xe6, 0x60), ...}) = 0
[pid 1046078] ioctl(5, BLKSSZGET, [512]) = 0
[pid 1046078] ioctl(5, BLKGETSIZE64, [26843545600]) = 0
[pid 1046078] pread64(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 134217728) = 4096
[pid 1046078] pread64(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 134217728) = 4096
[pid 1046078] pread64(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\253\277\322]\203\4\f\220\0\0\0\200\203t\2m\0\4\0\0\0\0\0\10\0\0\1\1\0\0\0H\0\0\20\0\0\0\0\0\0\0\0\1\377\377\377\377\0\0\0\1\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 69632, 0) = 69632
[pid 1046078] fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0xc), ...}) = 0
[pid 1046078] write(1, "md_offset 134217728\n", 20md_offset 134217728
) = 20
[pid 1046078] write(1, "al_offset 134221824\n", 20al_offset 134221824
) = 20
[pid 1046078] write(1, "bm_offset 134254592\n", 20bm_offset 134254592
) = 20
[pid 1046078] write(1, "\n", 1
)         = 1
[pid 1046078] write(1, "Found some data\n", 16Found some data
) = 16
[pid 1046078] write(1, "\n", 1
)         = 1
[pid 1046078] write(1, " ==> This might destroy existing data! <==\n", 43 ==> This might destroy existing data! <==
) = 43
[pid 1046078] write(2, "\nDo you want to proceed?\n", 25
Do you want to proceed?
) = 25

Thanks,
James


More information about the drbd-user mailing list