[Drbd-dev] Bug#659762: lvm2: LVM commands freeze after snapshot delete fails
Urban Loesch
bind at enas.net
Tue Mar 4 15:19:15 CET 2014
Hi,
we had the same problems with Debian Wheezy, LVM2 and DRBD.
But this seems not DRBD related. It seems to be some problem between lvm and udevd.
See:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549691
Stopping udevd before taking the snapshot and starting after removing the
snapshot solved the problem for us. It's only a workaround, but it works for us.
Regards
Urban
Am 26.07.2013 17:14, schrieb Frank Steinborn:
> Hi,
>
> we are a bit further in debugging this. We installed a DELL PowerEdge r620 (same hardware as used in our DRBD-cluster where this problem happens). As
> noone in this thread brought DRBD into play, I didn't expect any interaction with it related to this bug. However, we were not able to reproduce with
> just LVM2 (eg. configure LV, do IO in LV, remove LV, hang.)
>
> So we installed a second machine and put DRBD on top of the LVs. And voila, as soon as we create a snapshot of the LV where DRBD is on top and remove
> this snapshot it fails ca. 1/3 of the time.
>
> Some facts:
>
> root at drbd-primary:~# lvremove --force /dev/vg0/lv0-snap
> Unable to deactivate open vg0-lv0--snap-cow (254:3)
> Failed to resume lv0-snap.
> libdevmapper exiting with 1 device(s) still suspended.
>
> After this, "dmsetup info" gives the following output:
>
> <<< snip >>>
>
> Name: vg0-lv0--snap
> State: ACTIVE
> Read Ahead: 256
> Tables present: LIVE
> Open count: 0
> Event number: 0
> Major, minor: 254, 1
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ
>
> Name: vg0-lv0-real
> State: ACTIVE
> Read Ahead: 0
> Tables present: LIVE
> Open count: 1
> Event number: 0
> Major, minor: 254, 2
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j-real
>
> Name: vg0-lv0
> State: SUSPENDED
> Read Ahead: 256
> Tables present: LIVE & INACTIVE
> Open count: 2
> Event number: 0
> Major, minor: 254, 0
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j
>
> Name: vg0-lv0--snap-cow
> State: ACTIVE
> Read Ahead: 0
> Tables present: LIVE
> Open count: 0
> Event number: 0
> Major, minor: 254, 3
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ-cow
>
> <<< snap >>>
>
> As you can see, the real LV with DRBD on top is now in state SUSPENDED - which causes the cluster to be non-functional as IO operations stall on both
> the primary and secondary node until one does "dmsetup resume /dev/vg0/lv0".
>
> Another interesting issue we've seen: after doing "dmsetup resume /dev/vg0/lv0", lv0-snap doesn't appear to be a snapshot anymore, given the output of
> lvs (lv0-snap has no origin anymore):
>
> LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
> lv0 vg0 -wi-ao-- 200.00g
> lv0-snap vg0 -wi-a--- 40.00g
>
>
> Some miscellaneous notes:
> * It _feels_ to only happen when the snapshot is filled at least something around 50-60%.
> * We can trigger something like this even without DRBD. When triggered however, the LV will never end up in SUSPENDED state and a second try of
> lvremove will always succeed.
>
> Thats all we have so far. I already had a private conversation with waldi at debian.org <mailto:waldi at debian.org> on this and we will (probably) provide
> him remote access on this system as soon as we have the setup reachable from the outside.
>
> Please let me know if I can provide any more information to get this fixed. I put drbd-dev in cc, maybe someone over there has an idea on this?
>
> @drbd-dev: system is debian wheezy, w/ drbd 8.3.11, lvm2 2.02.95.
>
> Thanks,
> Frank
More information about the drbd-dev
mailing list