[DRBD-user] drbd v8.4.10 Diskless/Diskless after reboot - Found meta data is "unclean"- apply-al: open(/dev/nvme0n1p1) failed: Device or resource busy
koltaig at netplan.hu
koltaig at netplan.hu
Wed Mar 13 01:21:44 CET 2019
Hello,
After an unattented power loss, facing a major issue:
Every reboot the DBRB comes up with Connected Diskless/Diskless status.
main problems:
dump-md response: Found meta data is "unclean"
apply-al command terminated with exit code 20 with this message
open(/dev/nvme0n1p1) failed: Device or resource busy
drbd resource config cannot be opened exclusive.
I set up an stackexchange question for it:
https://unix.stackexchange.com/questions/505893/drbd-comes-up-after-reboot-with-connected-diskless-diskless
**About the environment:**
----------------------------
This drbd resource normaly used as a block storage for lvm, which
configured as an (shared lvm) storage to a proxmox ve 5.3-8 cluster. On
top of drbd block device an lvm configured, but on drbd host lvm config
the device (/dev/nvme0n1p1) below drbd serivice are filtered out
(/etc/lvm/lvm.conf shown below)
*The device under drbd is an PCIe NVMe device*
It has some extra properties shown by systemctl:
root at pmx0:~# systemctl list-units | grep nvme
sys-devices-pci0000:00-0000:00:01.1-0000:0c:00.0-nvme-nvme0-nvme0n1-nvme0n1p1.device
loaded active plugged
/sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
sys-devices-pci0000:00-0000:00:01.1-0000:0c:00.0-nvme-nvme0-nvme0n1.device
loaded active plugged
/sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1
Other storage device normal SAS disks listing in sytemctl looks a little
different:
root at pmx0:~# systemctl list-units | grep sdb
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb-sdb1.device
loaded active plugged PERC_H710 1
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb-sdb2.device
loaded active plugged PERC_H710 2
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb.device
loaded active plugged PERC_H710
list NVMe /sys/devices/.. with ls:
root at pmx0:~# ls
/sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
alignment_offset dev discard_alignment holders inflight
partition power ro size start stat subsystem trace uevent
**Things are NOT hepls:**
----------------------------
- Reboot again not help
- drbd service restart not help
- drbdadm detach/disconnect/attach/service restart not help
- nfs-kernel-server service aren't confiured on these drbd nodes (so
cannot unconfigure nfs-server)
**After some investigation:**
-------------------------------
dump-md response: Found meta data is "unclean", please apply-al first
apply-al command terminated with exit code 20 with this message:
open(/dev/nvme0n1p1) failed: Device or resource busy
It seems that the problem is that this device (/dev/nvme0n1p1) used by
my
drbd resource config cannot be opened exclusive.
**Failing DRBD commands:**
----------------------------
root at pmx0:~# drbdadm attach r0
open(/dev/nvme0n1p1) failed: Device or resource busy
Operation canceled.
Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated
with exit code 20
root at pmx0:~# drbdadm apply-al r0
open(/dev/nvme0n1p1) failed: Device or resource busy
Operation canceled.
Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated
with exit code 20
root at pmx0:~# drbdadm dump-md r0
open(/dev/nvme0n1p1) failed: Device or resource busy
Exclusive open failed. Do it anyways?
[need to type 'yes' to confirm] yes
Found meta data is "unclean", please apply-al first
Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal dump-md' terminated
with exit code 255
**DRBD service status/commands:**
-----------------------------------
root at pmx0:~# drbd-overview
0:r0/0 Connected Secondary/Secondary Diskless/Diskless
root at pmx0:~# drbdadm dstate r0
Diskless/Diskless
root at pmx0:~# drbdadm disconnect r0
root at pmx0:~# drbd-overview
0:r0/0 . . .
root at pmx0:~# drbdadm detach r0
root at pmx0:~# drbd-overview
0:r0/0 . . .
**Trying reattach resource r0:**
----------------------------------
root at pmx0:~# drbdadm attach r0
open(/dev/nvme0n1p1) failed: Device or resource busy
Operation canceled.
Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated
with exit code 20
root at pmx0:~# drbdadm apply-al r0
open(/dev/nvme0n1p1) failed: Device or resource busy
Operation canceled.
Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated
with exit code 20
**lsof, fuser zero output:**
------------------------------
root at pmx0:~# lsof /dev/nvme0n1p1
root at pmx0:~# fuser /dev/nvme0n1p1
root at pmx0:~# fuser /dev/nvme0n1
root at pmx0:~# lsof /dev/nvme0n1
**Resource disk partition and LVM config:**
--------------------------------------------
root at pmx0:~# fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 1.9 TiB, 2048408248320 bytes, 4000797360 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x59762e31
Device Boot Start End Sectors Size Id Type
/dev/nvme0n1p1 2048 3825207295 3825205248 1.8T 83 Linux
root at pmx0:~# pvs
PV VG Fmt Attr PSize PFree
/dev/sdb2 pve lvm2 a-- 135.62g 16.00g
root at pmx0:~# vgs
VG #PV #LV #SN Attr VSize VFree
pve 1 3 0 wz--n- 135.62g 16.00g
root at pmx0:~# lvs
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert
data pve twi-a-tz-- 75.87g 0.00
0.04
root pve -wi-ao---- 33.75g
swap pve -wi-ao---- 8.00g
root at pmx0:~# vi /etc/lvm/lvm.conf
root at pmx0:~# cat /etc/lvm/lvm.conf | grep nvm
filter = [ "r|/dev/nvme0n1p1|", "a|/dev/sdb|", "a|sd.*|",
"a|drbd.*|", "r|.*|" ]
**DRBD resource config:**
----------------------------
root at pmx0:~# cat /etc/drbd.d/r0.res
resource r0 {
protocol C;
startup {
wfc-timeout 0; # non-zero wfc-timeout can be
dangerous
degr-wfc-timeout 300;
become-primary-on both;
}
net {
cram-hmac-alg sha1;
shared-secret "*********";
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
on pmx0 {
device /dev/drbd0;
disk /dev/nvme0n1p1;
address 10.0.20.15:7788;
meta-disk internal;
}
on pmx1 {
device /dev/drbd0;
disk /dev/nvme0n1p1;
address 10.0.20.16:7788;
meta-disk internal;
}
disk {
no-disk-barrier;
no-disk-flushes;
}
}
**OTHER NODE:**
-----------------
root at pmx1:~# drbd-overview
0:r0/0 Connected Secondary/Secondary Diskless/Diskless
and so on every command responses and configurations showing the same
like node pmx0 above...
**Debian and DRBD versions:**
-------------------------------
root at pmx0:~# uname -a
Linux pmx0 4.15.18-10-pve #1 SMP PVE 4.15.18-32 (Sat, 19 Jan 2019
10:09:37 +0100) x86_64 GNU/Linux
root at pmx0:~# cat /etc/debian_version
9.8
root at pmx0:~# dpkg --list| grep drbd
ii drbd-utils 8.9.10-2
amd64 RAID 1 over TCP/IP for Linux (user utilities)
root at pmx0:~# lsmod | grep drbd
drbd 364544 1
lru_cache 16384 1 drbd
libcrc32c 16384 2 dm_persistent_data,drbd
root at pmx0:~# modinfo drbd
filename:
/lib/modules/4.15.18-10-pve/kernel/drivers/block/drbd/drbd.ko
alias: block-major-147-*
license: GPL
version: 8.4.10
description: drbd - Distributed Replicated Block Device v8.4.10
author: Philipp Reisner <phil at linbit.com>, Lars Ellenberg
<lars at linbit.com>
srcversion: 9A7FB947BDAB6A2C83BA0D4
depends: lru_cache,libcrc32c
retpoline: Y
intree: Y
name: drbd
vermagic: 4.15.18-10-pve SMP mod_unload modversions
parm: allow_oos:DONT USE! (bool)
parm: disable_sendpage:bool
parm: proc_details:int
parm: minor_count:Approximate number of drbd devices
(1-255) (uint)
parm: usermode_helper:string
Thanks,
Gabor Koltai
ulockme kft, Hungary
More information about the drbd-user
mailing list