Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi there, yesterday I did a regular manual fail-over (swap-over) to the second node of a primary/slave drbd cluster. This is the haresources: filer01 IPaddr::172.16.1.240/24/bond0 IPaddr::172.16.2.240/24/bond0 Delay::1 drbddisk::cluster_metadata drbddisk::vg0drbd Delay::1 LVM::/dev/vg0 Filesystem::/dev/drbd0::/cluster_metadata::ext3::noatime,nodiratime iscsitarget The failover didn't succeed because pv/vg/lvscan (don't know which of the lvm part is actually kicked by the heartbeat) didn't find any pv. I first checked the lvm cache, deleted id, double checked for consistence of configurations on both nodes, but (at least) a manual pvscan responded it couldn't find any pv signature on my drbd1 (/dev/mapper/vg0drbd in my case). drbd0 is just a flat ext3 FS which worked as expected. The drbd itself always remained UpToDate/UpToDate This is a recent /proc/drbd (well, back on the primary because I needed the cluster up and running again) version: 8.3.9 (api:88/proto:86-95) GIT-hash: 1c3b2f71137171c1236b497969734da43b5bec90 build by root at filer01, 2011-01-03 07:01:01 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:156691051 dw:156691051 dr:0 al:0 bm:1640 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 On the currently active node, pvscan doesn't have any problems to startup lvm over drbd1... The previous failover test succeeded; in between, we removed the LUN on the backing store of the (now) problematic cluster member, created the LUN again with a different stripe size (on HW Raid) but with all other parameters identical. It's the same bytesize and the same blocksize as before. After the LUN has been setup, we invalidated it and let drbd sync it in again. Could anyone shed some light on what could be wrong? Thanks! Here's the drbd.conf: global { # minor-count 64; # dialog-refresh 5; # 5 seconds # disable-ip-verification; usage-count ask; } common { syncer { rate 368M; } } resource cluster_metadata { protocol C; handlers { pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f"; local-io-error "echo O > /proc/sysrq-trigger ; halt -f"; # outdate-peer "/usr/sbin/drbd-peer-outdater"; } startup { # wfc-timeout 0; degr-wfc-timeout 120; # 2 minutes. outdated-wfc-timeout 90; } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { # rate 10M; # after "r2"; al-extents 3389; } on filer01 { device /dev/drbd0; disk /dev/sda4; address 192.168.192.1:7788; meta-disk internal; } on filer02 { device /dev/drbd0; disk /dev/sda4; address 192.168.192.2:7788; meta-disk internal; } } resource vg0drbd { protocol C; startup { wfc-timeout 0; ## Infinite! degr-wfc-timeout 120; ## 2 minutes. outdated-wfc-timeout 90; } disk { no-disk-barrier; ## NUR MIT BBU! no-disk-flushes; ## NUR MIT BBU! no-disk-drain; on-io-error detach; } net { # timeout 60; # connect-int 10; # ping-int 10; max-buffers 8000; max-epoch-size 8000; sndbuf-size 512k; } syncer { after "cluster_metadata"; al-extents 3389; } on filer01 { device /dev/drbd1; disk /dev/sdb; address 192.168.192.1:7789; meta-disk internal; } on filer02 { device /dev/drbd1; disk /dev/sdb; address 192.168.192.2:7789; meta-disk internal; } } Mit freundlichen Gruessen -- Stephan Seitz Senior System Administrator netz-haut GmbH multimediale kommunikation Zweierweg 22 97074 Würzburg Telefon: 0931 2876247 Telefax: 0931 2876248 Web: www.netz-haut.de Amtsgericht Würzburg - HRB 10764 Geschäftsführer: Michael Daut, Kai Neugebauer