Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
My DRDB device on LVM filled up with data and now it is unusable after a power cycle. I configured two DRDB nodes running Openfiler with corosync and pacemaker as per the instructions here [http://www.howtoforge.com/openfiler-2.99-active-passive-with-corosync-pacemaker-and-drbd] over two years ago. At one point it swapped over to what was originally the secondary node "Openfiler2" and I left it like that and all was fine (AFAIK). (I did have a few issues in the early days with it losing sync on reboot / power failure, but that's ancient history.) Eventually the DRBD data partition filled up as there are processes that ftp files onto it. There were lots of proftpd processes that were stuck trying to do a CWD into the data partition and therefore the cpu 'load' went really high. I tried to start a process to delete old files and it got stuck. It wasn't doing anything, and I couldn't cancel or kill it. kill -9 <pid> did not work on that process or any of the stuck proftpd processes. So I could not unmount the drive and when I tried fuser it just killed my ssh session and failed to kill the proftpd processes. I restarted sshd via the console, and, as there had been some kernel panics I decided to reboot, hardly expecting it to succeed. It didn't - it got stuck and I had to kill the virtual power. When it came back up it could not mount the DRBD data partition (that uses LVM). Both the DRBD partitions were synchronized before and after the reboot - they reconnected and the primary stayed on 'Openfiler2'. The first errors after the reboot were in here: daemon.info<30>: Jan 28 15:26:34 Openfiler2 LVM[3228]: INFO: Activating volume group vg0drbd daemon.info<30>: Jan 28 15:26:34 Openfiler2 LVM[3228]: INFO: Reading all physical volumes. This may take a while... Found volume group "localvg" using metadata type lvm2 Found volume group "vg0drbd" using metadata type lvm2 kern.err<3>: Jan 28 15:26:34 Openfiler2 kernel: device-mapper: table: 253:1: linear: dm-linear: Device lookup failed kern.warn<4>: Jan 28 15:26:34 Openfiler2 kernel: device-mapper: ioctl: error adding target to table daemon.err<27>: Jan 28 15:26:34 Openfiler2 LVM[3228]: ERROR: device-mapper: reload ioctl failed: Invalid argument 1 logical volume(s) in volume group "vg0drbd" now active daemon.info<30>: Jan 28 15:26:34 Openfiler2 crmd: [1284]: info: process_lrm_event: LRM operation lvmdata_start_0 (call=26, rc=1, cib-update=31, confirmed=true) unknown error I tried to mount it manually but the device is missing. Any suggestions on how I can get this volume mounted? Thanks! For reference: kern.info<6>: Jan 28 17:28:11 Openfiler2 kernel: device-mapper: uevent: version 1.0.3 kern.info<6>: Jan 28 17:28:11 Openfiler2 kernel: device-mapper: ioctl: 4.17.0-ioctl (2010-03-05) initialised: dm-devel at redhat.com Linux Openfiler2 2.6.32-71.18.1.el6-0.20.smp.gcc4.1.x86_64 #1 SMP Fri Mar 25 23:12:47 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux My last line of /etc/fstab is commented out as it is controlled by pacemaker: #/dev/vg0drbd/filer /mnt/vg0drbd/filer xfs defaults,usrquota,grpquota 0 0 Right now I have: [root at Openfiler2 ~]# service drbd status drbd driver loaded OK; device status: version: 8.3.10 (api:88/proto:86-96) GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by phil at fat-tyre, 2011-01-28 12:17:35 m:res cs ro ds p mounted fstype 0:cluster_metadata Connected Primary/Secondary UpToDate/UpToDate C /cluster_metadata ext3 1:vg0_drbd Connected Primary/Secondary UpToDate/UpToDate C [root at Openfiler2 ~]# crm status ============ Last updated: Tue Jan 28 20:11:19 2014 Stack: openais Current DC: Openfiler1 - partition with quorum Version: 1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ Openfiler1 Openfiler2 ] Resource Group: g_services MetaFS (ocf::heartbeat:Filesystem): Started Openfiler2 lvmdata (ocf::heartbeat:LVM): Stopped DataFS (ocf::heartbeat:Filesystem): Stopped openfiler (lsb:openfiler): Stopped ClusterIP (ocf::heartbeat:IPaddr2): Stopped iscsi (lsb:iscsi-target): Stopped ldap (lsb:ldap): Stopped samba (lsb:smb): Stopped nfs (lsb:nfs): Stopped nfslock (lsb:nfslock): Stopped ftp (lsb:proftpd): Stopped Master/Slave Set: ms_g_drbd Masters: [ Openfiler2 ] Slaves: [ Openfiler1 ] Failed actions: lvmdata_start_0 (node=Openfiler2, call=28, rc=1, status=complete): unknown error ###### More reference: [root at Openfiler2 ~]# pvdisplay --- Physical volume --- PV Name /dev/sdc1 VG Name localvg PV Size 975.93 GiB / not usable 2.32 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 249837 Free PE 0 Allocated PE 249837 PV UUID OPFfsk-LXkz-3Voc-CQbj-Qf8d-YmHs-cR4Xjt --- Physical volume --- PV Name /dev/sdb2 VG Name localvg PV Size 975.44 GiB / not usable 3.32 MiB Allocatable yes PE Size 4.00 MiB Total PE 249712 Free PE 25600 Allocated PE 224112 PV UUID yG1gfI-1HRb-AdCS-RqUV-Cm2j-pdqe-ZcB10j --- Physical volume --- PV Name /dev/dm-0 VG Name vg0drbd PV Size 1.81 TiB / not usable 1.11 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 473934 Free PE 0 Allocated PE 473934 PV UUID u8Au1m-U1pJ-RMik-bZGk-7NPA-3EOL-P21MHW [root at Openfiler2 ~]# pvscan PV /dev/sdc1 VG localvg lvm2 [975.93 GiB / 0 free] PV /dev/sdb2 VG localvg lvm2 [975.44 GiB / 100.00 GiB free] PV /dev/localvg/r1 VG vg0drbd lvm2 [1.81 TiB / 0 free] Total: 3 [3.71 TiB] / in use: 3 [3.71 TiB] / in no VG: 0 [0 ] [root at Openfiler2 ~]# vgdisplay --- Volume group --- VG Name localvg System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 23 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 1 Max PV 0 Cur PV 2 Act PV 2 VG Size 1.91 TiB PE Size 4.00 MiB Total PE 499549 Alloc PE / Size 473949 / 1.81 TiB Free PE / Size 25600 / 100.00 GiB VG UUID 5knbwX-LaJ5-1fEd-OD1R-59jZ-Otmy-8IKtVl --- Volume group --- VG Name vg0drbd System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 7 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 1.81 TiB PE Size 4.00 MiB Total PE 473934 Alloc PE / Size 473934 / 1.81 TiB Free PE / Size 0 / 0 VG UUID 4pgyVr-Eduj-2CVD-rUhf-Sr7L-Q814-45BE2N [root at Openfiler2 ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "localvg" using metadata type lvm2 Found volume group "vg0drbd" using metadata type lvm2 [root at Openfiler2 ~]# lvdisplay --- Logical volume --- LV Name /dev/localvg/r1 VG Name localvg LV UUID eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe LV Write Access read/write LV Status available # open 2 LV Size 1.81 TiB Current LE 473949 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:0 --- Logical volume --- LV Name /dev/vg0drbd/filer VG Name vg0drbd LV UUID eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe LV Write Access read/write LV Status NOT available LV Size 1.81 TiB Current LE 473934 Segments 1 Allocation inherit Read ahead sectors auto [root at Openfiler2 ~]# lvscan ACTIVE '/dev/localvg/r1' [1.81 TiB] inherit inactive '/dev/vg0drbd/filer' [1.81 TiB] inherit [root at Openfiler2 ~]# lvchange -ay /dev/vg0drbd/filer device-mapper: reload ioctl failed: Invalid argument [root at Openfiler2 ~]# lvdisplay --- Logical volume --- LV Name /dev/localvg/r1 VG Name localvg LV UUID eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe LV Write Access read/write LV Status available # open 2 LV Size 1.81 TiB Current LE 473949 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:0 --- Logical volume --- LV Name /dev/vg0drbd/filer VG Name vg0drbd LV UUID eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe LV Write Access read/write LV Status available # open 0 LV Size 1.81 TiB Current LE 473934 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:1 [root at Openfiler2 ~]# lvscan ACTIVE '/dev/localvg/r1' [1.81 TiB] inherit ACTIVE '/dev/vg0drbd/filer' [1.81 TiB] inherit [root at Openfiler2 ~]# ls -l /dev/dm-* brw-rw---- 1 root disk 253, 0 Jan 28 17:39 /dev/dm-0 brw-rw---- 1 root disk 253, 1 Jan 28 21:00 /dev/dm-1 [root at Openfiler2 ~]# dmsetup ls localvg-r1 (253, 0) vg0drbd-filer (253, 1) [root at Openfiler2 ~]# dmsetup info Name: localvg-r1 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 2 Event number: 0 Major, minor: 253, 0 Number of targets: 2 UUID: LVM-5knbwXLaJ51fEdOD1R59jZOtmy8IKtVleSuNJryFDCWCETsIgiIgTfJRYzAck7oe Name: vg0drbd-filer State: ACTIVE Read Ahead: 256 Tables present: None Open count: 0 Event number: 0 Major, minor: 253, 1 Number of targets: 0 UUID: LVM-4pgyVrEduj2CVDrUhfSr7LQ81445BE2NeSuNJryFDCWCETsIgiIgTfJRYzAck7oe [root at Openfiler2 ~]# dmsetup deps localvg-r1: 2 dependencies : (8, 18) (8, 33) vg0drbd-filer: 0 dependencies : [root at Openfiler2 ~]# dmsetup table localvg-r1: 0 2046664704 linear 8:33 2048 localvg-r1: 2046664704 1835925504 linear 8:18 2048 vg0drbd-filer: [root at Openfiler2 ~]# drbdsetup /dev/drbd1 show disk { size 0s _is_default; # bytes on-io-error detach; fencing dont-care _is_default; max-bio-bvecs 0 _is_default; } net { timeout 60 _is_default; # 1/10 seconds max-epoch-size 2048 _is_default; max-buffers 2048 _is_default; unplug-watermark 128 _is_default; connect-int 10 _is_default; # seconds ping-int 10 _is_default; # seconds sndbuf-size 0 _is_default; # bytes rcvbuf-size 0 _is_default; # bytes ko-count 0 _is_default; after-sb-0pri disconnect _is_default; after-sb-1pri disconnect _is_default; after-sb-2pri disconnect _is_default; rr-conflict disconnect _is_default; ping-timeout 5 _is_default; # 1/10 seconds on-congestion block _is_default; congestion-fill 0s _is_default; # byte congestion-extents 127 _is_default; } syncer { rate 112640k; # bytes/second after 0; al-extents 127 _is_default; on-no-data-accessible io-error _is_default; c-plan-ahead 0 _is_default; # 1/10 seconds c-delay-target 10 _is_default; # 1/10 seconds c-fill-target 0s _is_default; # bytes c-max-rate 102400k _is_default; # bytes/second c-min-rate 4096k _is_default; # bytes/second } protocol C; _this_host { device minor 1; disk "/dev/localvg/r1"; meta-disk internal; address ipv4 192.168.100.159:7789; } _remote_host { address ipv4 192.168.100.158:7789; } [root at Openfiler2 ~]# crm configure show node Openfiler1 \ attributes standby="off" node Openfiler2 \ attributes standby="off" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.4.157" cidr_netmask="32" \ op monitor interval="30s" primitive DataFS ocf:heartbeat:Filesystem \ params device="/dev/vg0drbd/filer" directory="/mnt/vg0drbd/filer" fstype="xfs" \ meta target-role="started" primitive MetaFS ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/cluster_metadata" fstype="ext3" \ meta target-role="started" primitive drbd_data ocf:linbit:drbd \ params drbd_resource="vg0_drbd" \ op monitor interval="15s" primitive drbd_meta ocf:linbit:drbd \ params drbd_resource="cluster_metadata" \ op monitor interval="15s" primitive ftp lsb:proftpd \ meta target-role="stopped" primitive iscsi lsb:iscsi-target primitive ldap lsb:ldap primitive lvmdata ocf:heartbeat:LVM \ params volgrpname="vg0drbd" \ meta target-role="started" primitive nfs lsb:nfs primitive nfslock lsb:nfslock primitive openfiler lsb:openfiler primitive samba lsb:smb group g_drbd drbd_meta drbd_data group g_services MetaFS lvmdata DataFS openfiler ClusterIP iscsi ldap samba nfs nfslock ftp ms ms_g_drbd g_drbd \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" clone-max="2" clone-node-max="1" notify="true" location cli-prefer-ClusterIP ClusterIP \ rule $id="cli-prefer-rule-ClusterIP" inf: #uname eq Openfiler1 location cli-standby-g_services g_services \ rule $id="cli-standby-rule-g_services" -inf: #uname eq Openfiler1 location cli-standby-ms_g_drbd ms_g_drbd \ rule $id="cli-standby-ms_g_drbd-rule" $role="Master" -inf: #uname eq Openfiler1 colocation c_g_services_on_g_drbd inf: g_services ms_g_drbd:Master order o_g_servicesafter_g_drbd inf: ms_g_drbd:promote g_services:start property $id="cib-bootstrap-options" \ dc-version="1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1390944138" rsc_defaults $id="rsc-options" \ resource-stickiness="100" I intentially stopped proftpd (ftp) via the Linux Cluster Management Console 1.5.14 so that I didn't get more proftpd processes starting up if you are wondering why it says stopped above. Many thanks and regards, Seb A