[DRBD-user] drbd v8.4.10 Diskless/Diskless after reboot - Found meta data is "unclean"- apply-al: open(/dev/nvme0n1p1) failed: Device or resource busy

koltaig at netplan.hu koltaig at netplan.hu
Wed Mar 13 01:21:44 CET 2019


Hello,

After an unattented power loss, facing a major issue:
Every reboot the DBRB comes up with Connected Diskless/Diskless status.

main problems:
dump-md response: Found meta data is "unclean"
apply-al command terminated with exit code 20 with this message
open(/dev/nvme0n1p1) failed: Device or resource busy
drbd resource config cannot be opened exclusive.

I set up an stackexchange question for it:
https://unix.stackexchange.com/questions/505893/drbd-comes-up-after-reboot-with-connected-diskless-diskless

**About the environment:**
----------------------------
This drbd resource normaly used as a block storage for lvm, which 
configured as an (shared lvm) storage to a proxmox ve 5.3-8 cluster. On 
top of drbd block device an lvm configured, but on drbd host lvm config 
the device (/dev/nvme0n1p1) below drbd serivice are filtered out 
(/etc/lvm/lvm.conf shown below)

*The device under drbd is an PCIe NVMe device*

It has some extra properties shown by systemctl:

     root at pmx0:~# systemctl list-units | grep nvme
     
sys-devices-pci0000:00-0000:00:01.1-0000:0c:00.0-nvme-nvme0-nvme0n1-nvme0n1p1.device 
             loaded active     plugged   
/sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
     
sys-devices-pci0000:00-0000:00:01.1-0000:0c:00.0-nvme-nvme0-nvme0n1.device 
                       loaded active     plugged   
/sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1

Other storage device normal SAS disks listing in sytemctl looks a little 
different:

     root at pmx0:~# systemctl list-units | grep sdb
     
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb-sdb1.device 
loaded active     plugged   PERC_H710 1
     
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb-sdb2.device 
loaded active     plugged   PERC_H710 2
     
sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb.device 
      loaded active     plugged   PERC_H710

list NVMe /sys/devices/.. with ls:

     root at pmx0:~# ls 
/sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
     alignment_offset  dev  discard_alignment  holders  inflight  
partition  power  ro  size  start  stat  subsystem  trace  uevent


**Things are NOT hepls:**
----------------------------
  - Reboot again not help
  - drbd service restart not help
  - drbdadm detach/disconnect/attach/service restart not help
  - nfs-kernel-server service aren't confiured on these drbd nodes (so 
cannot unconfigure nfs-server)

**After some investigation:**
-------------------------------
dump-md response: Found meta data is "unclean", please apply-al first
apply-al command terminated with exit code 20 with this message:
open(/dev/nvme0n1p1) failed: Device or resource busy

It seems that the problem is that this device (/dev/nvme0n1p1) used by 
my
drbd resource config cannot be opened exclusive.


**Failing DRBD commands:**
----------------------------
     root at pmx0:~# drbdadm attach r0
     open(/dev/nvme0n1p1) failed: Device or resource busy
     Operation canceled.
     Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated 
with exit code 20
     root at pmx0:~# drbdadm apply-al r0
     open(/dev/nvme0n1p1) failed: Device or resource busy
     Operation canceled.
     Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated 
with exit code 20

     root at pmx0:~# drbdadm dump-md r0
     open(/dev/nvme0n1p1) failed: Device or resource busy

     Exclusive open failed. Do it anyways?
     [need to type 'yes' to confirm] yes

     Found meta data is "unclean", please apply-al first
     Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal dump-md' terminated 
with exit code 255


**DRBD service status/commands:**
-----------------------------------
     root at pmx0:~# drbd-overview
      0:r0/0  Connected Secondary/Secondary Diskless/Diskless
     root at pmx0:~# drbdadm dstate r0
     Diskless/Diskless
     root at pmx0:~# drbdadm disconnect r0
     root at pmx0:~# drbd-overview
      0:r0/0  . . .
     root at pmx0:~# drbdadm detach r0
     root at pmx0:~# drbd-overview
      0:r0/0  . . .


**Trying reattach resource r0:**
----------------------------------
     root at pmx0:~# drbdadm attach r0
     open(/dev/nvme0n1p1) failed: Device or resource busy
     Operation canceled.
     Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated 
with exit code 20
     root at pmx0:~# drbdadm apply-al r0
     open(/dev/nvme0n1p1) failed: Device or resource busy
     Operation canceled.
     Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated 
with exit code 20


**lsof, fuser zero output:**
------------------------------
     root at pmx0:~# lsof /dev/nvme0n1p1
     root at pmx0:~# fuser /dev/nvme0n1p1
     root at pmx0:~# fuser /dev/nvme0n1
     root at pmx0:~# lsof /dev/nvme0n1


**Resource disk partition and LVM config:**
--------------------------------------------
     root at pmx0:~# fdisk -l /dev/nvme0n1
     Disk /dev/nvme0n1: 1.9 TiB, 2048408248320 bytes, 4000797360 sectors
     Units: sectors of 1 * 512 = 512 bytes
     Sector size (logical/physical): 512 bytes / 512 bytes
     I/O size (minimum/optimal): 512 bytes / 512 bytes
     Disklabel type: dos
     Disk identifier: 0x59762e31

     Device         Boot Start        End    Sectors  Size Id Type
     /dev/nvme0n1p1       2048 3825207295 3825205248  1.8T 83 Linux
     root at pmx0:~# pvs
       PV             VG           Fmt  Attr PSize   PFree
       /dev/sdb2      pve          lvm2 a--  135.62g  16.00g
     root at pmx0:~# vgs
       VG           #PV #LV #SN Attr   VSize   VFree
       pve            1   3   0 wz--n- 135.62g  16.00g
     root at pmx0:~# lvs
       LV            VG           Attr       LSize   Pool Origin Data%  
Meta%  Move Log Cpy%Sync Convert
       data          pve          twi-a-tz--  75.87g             0.00   
0.04
       root          pve          -wi-ao----  33.75g
       swap          pve          -wi-ao----   8.00g
     root at pmx0:~# vi /etc/lvm/lvm.conf
     root at pmx0:~# cat /etc/lvm/lvm.conf | grep nvm
             filter = [ "r|/dev/nvme0n1p1|", "a|/dev/sdb|", "a|sd.*|", 
"a|drbd.*|", "r|.*|" ]


**DRBD resource config:**
----------------------------
     root at pmx0:~# cat /etc/drbd.d/r0.res
     resource r0 {
             protocol C;
             startup {
                     wfc-timeout  0;     # non-zero wfc-timeout can be 
dangerous
                     degr-wfc-timeout 300;
     		become-primary-on both;
             }
             net {
                     cram-hmac-alg sha1;
                     shared-secret "*********";
                     allow-two-primaries;
                     after-sb-0pri discard-zero-changes;
                     after-sb-1pri discard-secondary;
                     after-sb-2pri disconnect;
             }
             on pmx0 {
                     device /dev/drbd0;
                     disk /dev/nvme0n1p1;
                     address 10.0.20.15:7788;
                     meta-disk internal;
             }
             on pmx1 {
                     device /dev/drbd0;
                     disk /dev/nvme0n1p1;
                     address 10.0.20.16:7788;
                     meta-disk internal;
             }
             disk {
                     no-disk-barrier;
                     no-disk-flushes;
             }
     }

**OTHER NODE:**
-----------------
     root at pmx1:~# drbd-overview
      0:r0/0  Connected Secondary/Secondary Diskless/Diskless

and so on every command responses and configurations showing the same 
like node pmx0 above...


**Debian and DRBD versions:**
-------------------------------
     root at pmx0:~# uname -a
     Linux pmx0 4.15.18-10-pve #1 SMP PVE 4.15.18-32 (Sat, 19 Jan 2019 
10:09:37 +0100) x86_64 GNU/Linux
     root at pmx0:~# cat /etc/debian_version
     9.8
     root at pmx0:~# dpkg --list| grep drbd
     ii  drbd-utils                           8.9.10-2                    
    amd64        RAID 1 over TCP/IP for Linux (user utilities)
     root at pmx0:~# lsmod | grep drbd
     drbd                  364544  1
     lru_cache              16384  1 drbd
     libcrc32c              16384  2 dm_persistent_data,drbd
     root at pmx0:~# modinfo drbd
     filename:       
/lib/modules/4.15.18-10-pve/kernel/drivers/block/drbd/drbd.ko
     alias:          block-major-147-*
     license:        GPL
     version:        8.4.10
     description:    drbd - Distributed Replicated Block Device v8.4.10
     author:         Philipp Reisner <phil at linbit.com>, Lars Ellenberg 
<lars at linbit.com>
     srcversion:     9A7FB947BDAB6A2C83BA0D4
     depends:        lru_cache,libcrc32c
     retpoline:      Y
     intree:         Y
     name:           drbd
     vermagic:       4.15.18-10-pve SMP mod_unload modversions
     parm:           allow_oos:DONT USE! (bool)
     parm:           disable_sendpage:bool
     parm:           proc_details:int
     parm:           minor_count:Approximate number of drbd devices 
(1-255) (uint)
     parm:           usermode_helper:string


Thanks,
Gabor Koltai
ulockme kft, Hungary


More information about the drbd-user mailing list