[DRBD-user] strange device problems after failover

Tue Jan 11 10:24:54 CET 2011

On 01/10/2011 05:16 PM, netz-haut - stephan seitz wrote:
> Hi there,
> 
> yesterday I did a regular manual fail-over (swap-over) to the second node of a primary/slave drbd cluster.
> 
> This is the haresources:
> filer01 IPaddr::172.16.1.240/24/bond0 IPaddr::172.16.2.240/24/bond0 Delay::1 drbddisk::cluster_metadata drbddisk::vg0drbd Delay::1 LVM::/dev/vg0 Filesystem::/dev/drbd0::/cluster_metadata::ext3::noatime,nodiratime iscsitarget
> 
> The failover didn't succeed because pv/vg/lvscan (don't know which of the lvm part is actually kicked by the heartbeat) didn't find any pv.
> I first checked the lvm cache, deleted id, double checked for consistence of configurations on both nodes, but (at least) a manual pvscan
> responded it couldn't find any pv signature on my drbd1 (/dev/mapper/vg0drbd in my case). drbd0 is just a flat ext3 FS which worked as expected.
> 
> The drbd itself always remained UpToDate/UpToDate
> 
> This is a recent /proc/drbd (well, back on the primary because I needed the cluster up and running again)
> 
> version: 8.3.9 (api:88/proto:86-95)
> GIT-hash: 1c3b2f71137171c1236b497969734da43b5bec90 build by root at filer01, 2011-01-03 07:01:01
>  0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
>  1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
>     ns:0 nr:156691051 dw:156691051 dr:0 al:0 bm:1640 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
> 
> On the currently active node, pvscan doesn't have any problems to startup lvm over drbd1...
> 
> The previous failover test succeeded; in between, we removed the LUN on the backing store of the
> (now) problematic cluster member, created the LUN again with a different stripe size (on HW Raid)
> but with all other parameters identical. It's the same bytesize and the same blocksize as before.
> After the LUN has been setup, we invalidated it and let drbd sync it in again.
> 
> Could anyone shed some light on what could be wrong? Thanks!

I believe I heard something about LVM being smart about aligning itself
to array stripes. If your stripe size changed, it may be possible that
LVM expects to find signatures at other boundaries now.

You may want to sync to another array with the original stripe size and
see if that works.

Regards,
Felix