[DRBD-user] Pulled the Power Plug

Sun Oct 29 18:57:11 CET 2006

Hello one and all,
this is my first post to this listing, after reading and observing what's
happening here,
i've been testing drbd for the past 1 month, after reading lots about it for
about 3 months.

anyway, one of my BruteForce test was pulling teh plug out the wall.
and this is the only failure i have come across since testing, can be my
config is bad, or something i'm doing in the recovery.
that's where i ask for the expertise of this list.

i'll describe the probem, then i'll list my configs at the end [drbd.conf,
ha.cf, haresources]
what i've tested so far..

The Crime Scene:
===========

Before the crime is commited:
-----------------------------------------
Server A: acts is primary for drbd0, drbd1 and drbd2 and secondary for drbd3
resources [lun0, lun1, lun2]
Server B: acts is primary for drbd3 and secondary for drbd0, drbd1 and
drbd2resources [zim0]

The crime is commited:
--------------------------------
* Shutdown heartbeat gracefully [server B]:  DRBD3 Resource survived the the
troumer and re-appear on Server A
* Heartbeat takeover [server A]: DRBD3 Resource survived the the troumer and
re-appear on Server A
* Shutdown server [server B]: DRBD3 Resource survived the the troumer and
re-appear on Server A  [shutdown -h now or reboot]
* Powerdown server [server B]: DRBD3 Resource survived the the troumer and
re-appear on Server A   [push the power button - linux will execute
shutdown]
** Pull the plug [server B]: DRBD3 Resource was brutely murdered, and never
survived  [brute force - no more electricity]

The evidence:
--------------------
DRBD3 Resource does not fail over to Server A [due to the next issues, the
intire heartbeat resource don't start]
DRBD3 Resource goes in Secondary Mode on server A.

Server B is restarted, DRBD Service gives it's warning at startup, and after
a few seconds it connects with Server A, and heartbeat still didn't pull
over the DRBD3 Resource.

DRBD3 resource now shows Secondary on both Server A and Server B

CSI at work:
-----------------
Once server B is up and running the following was done on Server B:
"drbdadm -- --do-what-I-say primary zim0" <-- the DRBD3 resource in
question.
"drbdadm -- connect zim0" <-- the DRBD3 resource in question.

The Conclusion:
---------------------
Once that was done, "cat /proc/drbd" [server B] indicated that DRBD3 is
primary/seconday, looks good
so we tried this:
[root at san01 etc]# mount -t jfs /dev/drbd3 /opt/zimbra

and we got this : 
mount: wrong fs type, bad option, bad superblock on /dev/drbd3,
       or too many mounted file systems 
(believe me the fs type was correct)

Now, what's so nice about this test, is we did the same producedure 3 times,
and all 3 times we ended with the same problem.

the easy part was this:
"mkfs.jfs /dev/drbd3"  are you sure? YES
and guess what, we were up and running again - no if no buts - (lucky for us
we are still beta testing -:)

Some how i have the strong belief that this is not the way DRBD was
programmed to function, cause formatting
the device during a power outage is not what every sysadmin wants to do..

So i hearby ask if someone knows exactly have to be done, to solve this
little mishap.

Field Notes:
---------------
/etc/ha.d/haresources

san01.funnyguys.biz 192.168.10.121 drbddisk::lun0
Filesystem::/dev/drbd0::/san/lun0::jfs drbddisk::lun1
Filesystem::/dev/drbd1::/san/lun1::jfs drbddisk::lun2
Filesystem::/dev/drbd2::/san/lun2::jfs smb
san02.funnyguys.biz 192.168.10.122 drbddisk::zim0
Filesystem::/dev/drbd3::/opt/zimbra::jfs

++++++++++++++++++++++++++

/etc/ha.d/ha.cf
debugfile /var/log/hadebug
logfile /var/log/halog
logfacility     local0

udpport     694
keepalive   2
deadtime   30
warntime   10
initdead     120

bcast   eth0
auto_failback   off

watchdog        /dev/watchdog

respawn hacluster /usr/lib/heartbeat/ipfail
ping 192.168.10.254 192.168.10.1 

node    san01.funnyguys.biz
node    san02.funnyguys.biz

serial  /dev/ttyS0
baud   115200

++++++++++++++++++++++++++

/etc/drbd.conf

#
#
global {
   minor-count 5;
   dialog-refresh 5; # 5 seconds
}

resource lun0 {
 protocol  C;

 incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";

 startup {
   # wfc-timeout  0;
   degr-wfc-timeout 120;    # 2 minutes.
 }

 disk {
   on-io-error   detach;
 }

net {
 }

 syncer {
   rate 10M;
   group 1;
   al-extents 257;
 }

 on san01.funnyguys.biz {
   device   /dev/drbd0;
   disk     /dev/sanvg/lun0;
   address  192.168.15.160:7789;
   meta-disk  internal;
 }

 on san02.funnyguys.biz {
   device   /dev/drbd0;
   disk     /dev/sanvg/lun0;
   address  192.168.15.162:7789;
   meta-disk  internal;
 }
}

resource lun1 {
 protocol  C;

 incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";

 startup {
   # wfc-timeout  0;
   degr-wfc-timeout 120;    # 2 minutes.
 }

 disk {
   on-io-error   detach;
 }

net {
 }

 syncer {
   rate 10M;
   group 1;
   al-extents 257;
 }

 on san01.funnyguys.biz {
   device   /dev/drbd1;
   disk     /dev/sanvg/lun2;
   address  192.168.15.160:7790;
   meta-disk  internal;
 }

 on san02.funnyguys.biz {
   device   /dev/drbd1;
   disk     /dev/sanvg/lun2;
   address  192.168.15.162:7790;
   meta-disk  internal;
 }
}

resource lun2 {
 protocol  C;

 incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";

 startup {
   # wfc-timeout  0;
   degr-wfc-timeout 120;    # 2 minutes.
 }

 disk {
   on-io-error   detach;
 }

net {
 }

 syncer {
   rate 10M;
   group 1;
   al-extents 257;
 }

 on san01.funnyguys.biz {
   device   /dev/drbd2;
   disk     /dev/sanvg/lun1;
   address  192.168.15.160:7791;
   meta-disk  internal;
 }

 on san02.funnyguys.biz {
   device   /dev/drbd2;
   disk     /dev/sanvg/lun1;
   address  192.168.15.162:7791;
   meta-disk  internal;
 }
}

resource zim0 {
 protocol  C;

 incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";

 startup {
   # wfc-timeout  0;
   degr-wfc-timeout 120;    # 2 minutes.
 }

 disk {
   on-io-error   detach;
 }

net {
 }

 syncer {
   rate 10M;
   group 1;
   al-extents 257;
 }

 on san01.funnyguys.biz {
   device   /dev/drbd3;
   disk     /dev/sysvg/zim0;
   address  192.168.15.160:7792;
   meta-disk  internal;
 }

 on san02.funnyguys.biz {
   device   /dev/drbd3;
   disk     /dev/sysvg/zim0;
   address  192.168.15.162:7792;
   meta-disk  internal;
 }
}

-- 
View this message in context: http://www.nabble.com/Pulled-the-Power-Plug-tf2535391.html#a7063506
Sent from the DRBD - User mailing list archive at Nabble.com.