[DRBD-user] Full DRDB device on LVM is now unusable

Lars Ellenberg lars.ellenberg at linbit.com
Tue Feb 4 23:52:17 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Jan 28, 2014 at 09:47:12PM +0000, Seb A wrote:
> My DRDB device on LVM filled up with data and now it is unusable after a power cycle.
> 
> I configured two DRDB nodes running Openfiler with corosync and pacemaker as per the instructions here [http://www.howtoforge.com/openfiler-2.99-active-passive-with-corosync-pacemaker-and-drbd] over two years ago.  At one point it swapped over to what was originally the secondary node "Openfiler2" and I left it like that and all was fine (AFAIK).  (I did have a few issues in the early days with it losing sync on reboot / power failure, but that's ancient history.)  Eventually the DRBD data partition filled up as there are processes that ftp files onto it.  There were lots of proftpd processes that were stuck trying to do a CWD into the data partition and therefore the cpu 'load' went really high.  I tried to start a process to delete old files and it got stuck.  It wasn't doing anything, and I couldn't cancel or kill it.  kill -9 <pid> did not work on that process or any of the stuck proftpd processes.  So I could not unmount the drive and when I tried fuser it just killed my
>   ssh session and failed to kill the proftpd processes.  I restarted sshd via the console, and, as there had been some kernel panics I decided to reboot, hardly expecting it to succeed.  It didn't - it got stuck and I had to kill the virtual power.  When it came back up it could not mount the DRBD data partition (that uses LVM).  Both the DRBD partitions were synchronized before and after the reboot - they reconnected and the primary stayed on 'Openfiler2'.

Let me "top post"...

Summary: DRBD is there but you where not able to activate the volume
group respective the logical volume.

You "dmsetup table" below indicates that you have a dm target of the
"correct" name, but with an empty table.

I could only guess how you got there, but that does not help.

activating an LV when there already exists an "unfinished" dm target
with that name simply fails.

you can reproduce "similar symptoms" easily:

on some working test setup (no DRBD involved),
 * create a scratch volume.
    lvcreate -n scratch -L 1g VG
 * deactivate it
    lvchange -an VG/scratch
 * create a device mapper target with an empty table
    dmsetup create VG-scratch < /dev/null
 * try to activate
    lvchange -ay VG/scratch
     device-mapper: create ioctl on VG-scratch failed: Device or resource busy
   (maybe if you tweak the versions, or the exact test, you also get an
   "Invalid argument". Whatever.)

Fix: delete that devicemapper target, and try again.
   dmsetup remove VG-scratch
   lvchange -ay VG/scratch

Once that succeeds, you probably need to fsck, and recover some space,
before you can use that FS again. Over the years there have been some
nasty bugs related to file systems running out of space.

In any case that all has not much to do with DRBD.

If this does not help, and the issue persists, and you are still
interested to resolve it, ask again.

Hth,
	Lars


> 
> The first errors after the reboot were in here:
> 
> daemon.info<30>: Jan 28 15:26:34 Openfiler2 LVM[3228]: INFO: Activating volume group vg0drbd
> daemon.info<30>: Jan 28 15:26:34 Openfiler2 LVM[3228]: INFO: Reading all physical volumes. This may take a while... Found volume group "localvg" using metadata type lvm2 Found volume group "vg0drbd" using metadata type lvm2
> kern.err<3>: Jan 28 15:26:34 Openfiler2 kernel: device-mapper: table: 253:1: linear: dm-linear: Device lookup failed
> kern.warn<4>: Jan 28 15:26:34 Openfiler2 kernel: device-mapper: ioctl: error adding target to table
> daemon.err<27>: Jan 28 15:26:34 Openfiler2 LVM[3228]: ERROR: device-mapper: reload ioctl failed: Invalid argument 1 logical volume(s) in volume group "vg0drbd" now active
> daemon.info<30>: Jan 28 15:26:34 Openfiler2 crmd: [1284]: info: process_lrm_event: LRM operation lvmdata_start_0 (call=26, rc=1, cib-update=31, confirmed=true) unknown error
> 
> I tried to mount it manually but the device is missing.  Any suggestions on how I can get this volume mounted?  Thanks!
> 
> For reference:
> kern.info<6>: Jan 28 17:28:11 Openfiler2 kernel: device-mapper: uevent: version 1.0.3
> kern.info<6>: Jan 28 17:28:11 Openfiler2 kernel: device-mapper: ioctl: 4.17.0-ioctl (2010-03-05) initialised: dm-devel at redhat.com
> 
> Linux Openfiler2 2.6.32-71.18.1.el6-0.20.smp.gcc4.1.x86_64 #1 SMP Fri Mar 25 23:12:47 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
> 
> My last line of /etc/fstab is commented out as it is controlled by pacemaker:
> #/dev/vg0drbd/filer /mnt/vg0drbd/filer xfs defaults,usrquota,grpquota 0 0
> 
> 
> Right now I have:
> 
> [root at Openfiler2 ~]# service drbd status
> drbd driver loaded OK; device status:
> version: 8.3.10 (api:88/proto:86-96)
> GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by phil at fat-tyre, 2011-01-28 12:17:35
> m:res               cs         ro                 ds                 p  mounted            fstype
> 0:cluster_metadata  Connected  Primary/Secondary  UpToDate/UpToDate  C  /cluster_metadata  ext3
> 1:vg0_drbd          Connected  Primary/Secondary  UpToDate/UpToDate  C
> 
> [root at Openfiler2 ~]# crm status
> ============
> Last updated: Tue Jan 28 20:11:19 2014
> Stack: openais
> Current DC: Openfiler1 - partition with quorum
> Version: 1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ Openfiler1 Openfiler2 ]
> 
>  Resource Group: g_services
>      MetaFS     (ocf::heartbeat:Filesystem):    Started Openfiler2
>      lvmdata    (ocf::heartbeat:LVM):   Stopped
>      DataFS     (ocf::heartbeat:Filesystem):    Stopped
>      openfiler  (lsb:openfiler):        Stopped
>      ClusterIP  (ocf::heartbeat:IPaddr2):       Stopped
>      iscsi      (lsb:iscsi-target):     Stopped
>      ldap       (lsb:ldap):     Stopped
>      samba      (lsb:smb):      Stopped
>      nfs        (lsb:nfs):      Stopped
>      nfslock    (lsb:nfslock):  Stopped
>      ftp        (lsb:proftpd):  Stopped
>  Master/Slave Set: ms_g_drbd
>      Masters: [ Openfiler2 ]
>      Slaves: [ Openfiler1 ]
> 
> Failed actions:
>     lvmdata_start_0 (node=Openfiler2, call=28, rc=1, status=complete): unknown error
> 
> ######
> 
> More reference:
> 
> [root at Openfiler2 ~]# pvdisplay
>   --- Physical volume ---
>   PV Name               /dev/sdc1
>   VG Name               localvg
>   PV Size               975.93 GiB / not usable 2.32 MiB
>   Allocatable           yes (but full)
>   PE Size               4.00 MiB
>   Total PE              249837
>   Free PE               0
>   Allocated PE          249837
>   PV UUID               OPFfsk-LXkz-3Voc-CQbj-Qf8d-YmHs-cR4Xjt
> 
>   --- Physical volume ---
>   PV Name               /dev/sdb2
>   VG Name               localvg
>   PV Size               975.44 GiB / not usable 3.32 MiB
>   Allocatable           yes
>   PE Size               4.00 MiB
>   Total PE              249712
>   Free PE               25600
>   Allocated PE          224112
>   PV UUID               yG1gfI-1HRb-AdCS-RqUV-Cm2j-pdqe-ZcB10j
> 
>   --- Physical volume ---
>   PV Name               /dev/dm-0
>   VG Name               vg0drbd
>   PV Size               1.81 TiB / not usable 1.11 MiB
>   Allocatable           yes (but full)
>   PE Size               4.00 MiB
>   Total PE              473934
>   Free PE               0
>   Allocated PE          473934
>   PV UUID               u8Au1m-U1pJ-RMik-bZGk-7NPA-3EOL-P21MHW
> 
> [root at Openfiler2 ~]# pvscan
>   PV /dev/sdc1         VG localvg   lvm2 [975.93 GiB / 0    free]
>   PV /dev/sdb2         VG localvg   lvm2 [975.44 GiB / 100.00 GiB free]
>   PV /dev/localvg/r1   VG vg0drbd   lvm2 [1.81 TiB / 0    free]
>   Total: 3 [3.71 TiB] / in use: 3 [3.71 TiB] / in no VG: 0 [0   ]
> 
> [root at Openfiler2 ~]# vgdisplay
>   --- Volume group ---
>   VG Name               localvg
>   System ID
>   Format                lvm2
>   Metadata Areas        2
>   Metadata Sequence No  23
>   VG Access             read/write
>   VG Status             resizable
>   MAX LV                0
>   Cur LV                1
>   Open LV               1
>   Max PV                0
>   Cur PV                2
>   Act PV                2
>   VG Size               1.91 TiB
>   PE Size               4.00 MiB
>   Total PE              499549
>   Alloc PE / Size       473949 / 1.81 TiB
>   Free  PE / Size       25600 / 100.00 GiB
>   VG UUID               5knbwX-LaJ5-1fEd-OD1R-59jZ-Otmy-8IKtVl
> 
>   --- Volume group ---
>   VG Name               vg0drbd
>   System ID
>   Format                lvm2
>   Metadata Areas        1
>   Metadata Sequence No  7
>   VG Access             read/write
>   VG Status             resizable
>   MAX LV                0
>   Cur LV                1
>   Open LV               0
>   Max PV                0
>   Cur PV                1
>   Act PV                1
>   VG Size               1.81 TiB
>   PE Size               4.00 MiB
>   Total PE              473934
>   Alloc PE / Size       473934 / 1.81 TiB
>   Free  PE / Size       0 / 0
>   VG UUID               4pgyVr-Eduj-2CVD-rUhf-Sr7L-Q814-45BE2N
> 
> [root at Openfiler2 ~]# vgscan
>   Reading all physical volumes.  This may take a while...
>   Found volume group "localvg" using metadata type lvm2
>   Found volume group "vg0drbd" using metadata type lvm2
> 
> [root at Openfiler2 ~]# lvdisplay
>   --- Logical volume ---
>   LV Name                /dev/localvg/r1
>   VG Name                localvg
>   LV UUID                eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe
>   LV Write Access        read/write
>   LV Status              available
>   # open                 2
>   LV Size                1.81 TiB
>   Current LE             473949
>   Segments               2
>   Allocation             inherit
>   Read ahead sectors     auto
>   - currently set to     256
>   Block device           253:0
> 
>   --- Logical volume ---
>   LV Name                /dev/vg0drbd/filer
>   VG Name                vg0drbd
>   LV UUID                eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe
>   LV Write Access        read/write
>   LV Status              NOT available
>   LV Size                1.81 TiB
>   Current LE             473934
>   Segments               1
>   Allocation             inherit
>   Read ahead sectors     auto
> 
> [root at Openfiler2 ~]# lvscan
>   ACTIVE            '/dev/localvg/r1' [1.81 TiB] inherit
>   inactive          '/dev/vg0drbd/filer' [1.81 TiB] inherit
> 
> [root at Openfiler2 ~]# lvchange -ay /dev/vg0drbd/filer
>   device-mapper: reload ioctl failed: Invalid argument
> 
> [root at Openfiler2 ~]# lvdisplay
>   --- Logical volume ---
>   LV Name                /dev/localvg/r1
>   VG Name                localvg
>   LV UUID                eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe
>   LV Write Access        read/write
>   LV Status              available
>   # open                 2
>   LV Size                1.81 TiB
>   Current LE             473949
>   Segments               2
>   Allocation             inherit
>   Read ahead sectors     auto
>   - currently set to     256
>   Block device           253:0
> 
>   --- Logical volume ---
>   LV Name                /dev/vg0drbd/filer
>   VG Name                vg0drbd
>   LV UUID                eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe
>   LV Write Access        read/write
>   LV Status              available
>   # open                 0
>   LV Size                1.81 TiB
>   Current LE             473934
>   Segments               1
>   Allocation             inherit
>   Read ahead sectors     auto
>   - currently set to     256
>   Block device           253:1
> 
> [root at Openfiler2 ~]# lvscan
>   ACTIVE            '/dev/localvg/r1' [1.81 TiB] inherit
>   ACTIVE            '/dev/vg0drbd/filer' [1.81 TiB] inherit
> 
> [root at Openfiler2 ~]# ls -l /dev/dm-*
> brw-rw---- 1 root disk 253, 0 Jan 28 17:39 /dev/dm-0
> brw-rw---- 1 root disk 253, 1 Jan 28 21:00 /dev/dm-1
> 
> [root at Openfiler2 ~]# dmsetup ls
> localvg-r1      (253, 0)
> vg0drbd-filer   (253, 1)
> 
> [root at Openfiler2 ~]# dmsetup info
> Name:              localvg-r1
> State:             ACTIVE
> Read Ahead:        256
> Tables present:    LIVE
> Open count:        2
> Event number:      0
> Major, minor:      253, 0
> Number of targets: 2
> UUID: LVM-5knbwXLaJ51fEdOD1R59jZOtmy8IKtVleSuNJryFDCWCETsIgiIgTfJRYzAck7oe
> 
> Name:              vg0drbd-filer
> State:             ACTIVE
> Read Ahead:        256
> Tables present:    None
> Open count:        0
> Event number:      0
> Major, minor:      253, 1
> Number of targets: 0
> UUID: LVM-4pgyVrEduj2CVDrUhfSr7LQ81445BE2NeSuNJryFDCWCETsIgiIgTfJRYzAck7oe
> 
> [root at Openfiler2 ~]# dmsetup deps
> localvg-r1: 2 dependencies      : (8, 18) (8, 33)
> vg0drbd-filer: 0 dependencies   :
> 
> [root at Openfiler2 ~]# dmsetup table
> localvg-r1: 0 2046664704 linear 8:33 2048
> localvg-r1: 2046664704 1835925504 linear 8:18 2048
> vg0drbd-filer:
> 
> [root at Openfiler2 ~]# drbdsetup /dev/drbd1 show
> disk {
>         size                    0s _is_default; # bytes
>         on-io-error             detach;
>         fencing                 dont-care _is_default;
>         max-bio-bvecs           0 _is_default;
> }
> net {
>         timeout                 60 _is_default; # 1/10 seconds
>         max-epoch-size          2048 _is_default;
>         max-buffers             2048 _is_default;
>         unplug-watermark        128 _is_default;
>         connect-int             10 _is_default; # seconds
>         ping-int                10 _is_default; # seconds
>         sndbuf-size             0 _is_default; # bytes
>         rcvbuf-size             0 _is_default; # bytes
>         ko-count                0 _is_default;
>         after-sb-0pri           disconnect _is_default;
>         after-sb-1pri           disconnect _is_default;
>         after-sb-2pri           disconnect _is_default;
>         rr-conflict             disconnect _is_default;
>         ping-timeout            5 _is_default; # 1/10 seconds
>         on-congestion           block _is_default;
>         congestion-fill         0s _is_default; # byte
>         congestion-extents      127 _is_default;
> }
> syncer {
>         rate                    112640k; # bytes/second
>         after                   0;
>         al-extents              127 _is_default;
>         on-no-data-accessible   io-error _is_default;
>         c-plan-ahead            0 _is_default; # 1/10 seconds
>         c-delay-target          10 _is_default; # 1/10 seconds
>         c-fill-target           0s _is_default; # bytes
>         c-max-rate              102400k _is_default; # bytes/second
>         c-min-rate              4096k _is_default; # bytes/second
> }
> protocol C;
> _this_host {
>         device                  minor 1;
>         disk                    "/dev/localvg/r1";
>         meta-disk               internal;
>         address                 ipv4 192.168.100.159:7789;
> }
> _remote_host {
>         address                 ipv4 192.168.100.158:7789;
> }
> 
> [root at Openfiler2 ~]# crm configure show
> node Openfiler1 \
>         attributes standby="off"
> node Openfiler2 \
>         attributes standby="off"
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>         params ip="192.168.4.157" cidr_netmask="32" \
>         op monitor interval="30s"
> primitive DataFS ocf:heartbeat:Filesystem \
>         params device="/dev/vg0drbd/filer" directory="/mnt/vg0drbd/filer" fstype="xfs" \
>         meta target-role="started"
> primitive MetaFS ocf:heartbeat:Filesystem \
>         params device="/dev/drbd0" directory="/cluster_metadata" fstype="ext3" \
>         meta target-role="started"
> primitive drbd_data ocf:linbit:drbd \
>         params drbd_resource="vg0_drbd" \
>         op monitor interval="15s"
> primitive drbd_meta ocf:linbit:drbd \
>         params drbd_resource="cluster_metadata" \
>         op monitor interval="15s"
> primitive ftp lsb:proftpd \
>         meta target-role="stopped"
> primitive iscsi lsb:iscsi-target
> primitive ldap lsb:ldap
> primitive lvmdata ocf:heartbeat:LVM \
>         params volgrpname="vg0drbd" \
>         meta target-role="started"
> primitive nfs lsb:nfs
> primitive nfslock lsb:nfslock
> primitive openfiler lsb:openfiler
> primitive samba lsb:smb
> group g_drbd drbd_meta drbd_data
> group g_services MetaFS lvmdata DataFS openfiler ClusterIP iscsi ldap samba nfs nfslock ftp
> ms ms_g_drbd g_drbd \
>         meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> location cli-prefer-ClusterIP ClusterIP \
>         rule $id="cli-prefer-rule-ClusterIP" inf: #uname eq Openfiler1
> location cli-standby-g_services g_services \
>         rule $id="cli-standby-rule-g_services" -inf: #uname eq Openfiler1
> location cli-standby-ms_g_drbd ms_g_drbd \
>         rule $id="cli-standby-ms_g_drbd-rule" $role="Master" -inf: #uname eq Openfiler1
> colocation c_g_services_on_g_drbd inf: g_services ms_g_drbd:Master
> order o_g_servicesafter_g_drbd inf: ms_g_drbd:promote g_services:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1390944138"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="100"
> 
> I intentially stopped proftpd (ftp) via the Linux Cluster Management Console 1.5.14 so that I didn't get more proftpd processes starting up if you are wondering why it says stopped above.
> 
> Many thanks and regards, 
> 
> Seb A
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list