Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello one and all,
this is my first post to this listing, after reading and observing what's
happening here,
i've been testing drbd for the past 1 month, after reading lots about it for
about 3 months.
anyway, one of my BruteForce test was pulling teh plug out the wall.
and this is the only failure i have come across since testing, can be my
config is bad, or something i'm doing in the recovery.
that's where i ask for the expertise of this list.
i'll describe the probem, then i'll list my configs at the end [drbd.conf,
ha.cf, haresources]
what i've tested so far..
The Crime Scene:
===========
Before the crime is commited:
-----------------------------------------
Server A: acts is primary for drbd0, drbd1 and drbd2 and secondary for drbd3
resources [lun0, lun1, lun2]
Server B: acts is primary for drbd3 and secondary for drbd0, drbd1 and
drbd2resources [zim0]
The crime is commited:
--------------------------------
* Shutdown heartbeat gracefully [server B]: DRBD3 Resource survived the the
troumer and re-appear on Server A
* Heartbeat takeover [server A]: DRBD3 Resource survived the the troumer and
re-appear on Server A
* Shutdown server [server B]: DRBD3 Resource survived the the troumer and
re-appear on Server A [shutdown -h now or reboot]
* Powerdown server [server B]: DRBD3 Resource survived the the troumer and
re-appear on Server A [push the power button - linux will execute
shutdown]
** Pull the plug [server B]: DRBD3 Resource was brutely murdered, and never
survived [brute force - no more electricity]
The evidence:
--------------------
DRBD3 Resource does not fail over to Server A [due to the next issues, the
intire heartbeat resource don't start]
DRBD3 Resource goes in Secondary Mode on server A.
Server B is restarted, DRBD Service gives it's warning at startup, and after
a few seconds it connects with Server A, and heartbeat still didn't pull
over the DRBD3 Resource.
DRBD3 resource now shows Secondary on both Server A and Server B
CSI at work:
-----------------
Once server B is up and running the following was done on Server B:
"drbdadm -- --do-what-I-say primary zim0" <-- the DRBD3 resource in
question.
"drbdadm -- connect zim0" <-- the DRBD3 resource in question.
The Conclusion:
---------------------
Once that was done, "cat /proc/drbd" [server B] indicated that DRBD3 is
primary/seconday, looks good
so we tried this:
[root at san01 etc]# mount -t jfs /dev/drbd3 /opt/zimbra
and we got this :
mount: wrong fs type, bad option, bad superblock on /dev/drbd3,
or too many mounted file systems
(believe me the fs type was correct)
Now, what's so nice about this test, is we did the same producedure 3 times,
and all 3 times we ended with the same problem.
the easy part was this:
"mkfs.jfs /dev/drbd3" are you sure? YES
and guess what, we were up and running again - no if no buts - (lucky for us
we are still beta testing -:)
Some how i have the strong belief that this is not the way DRBD was
programmed to function, cause formatting
the device during a power outage is not what every sysadmin wants to do..
So i hearby ask if someone knows exactly have to be done, to solve this
little mishap.
Field Notes:
---------------
/etc/ha.d/haresources
san01.funnyguys.biz 192.168.10.121 drbddisk::lun0
Filesystem::/dev/drbd0::/san/lun0::jfs drbddisk::lun1
Filesystem::/dev/drbd1::/san/lun1::jfs drbddisk::lun2
Filesystem::/dev/drbd2::/san/lun2::jfs smb
san02.funnyguys.biz 192.168.10.122 drbddisk::zim0
Filesystem::/dev/drbd3::/opt/zimbra::jfs
++++++++++++++++++++++++++
/etc/ha.d/ha.cf
debugfile /var/log/hadebug
logfile /var/log/halog
logfacility local0
udpport 694
keepalive 2
deadtime 30
warntime 10
initdead 120
bcast eth0
auto_failback off
watchdog /dev/watchdog
respawn hacluster /usr/lib/heartbeat/ipfail
ping 192.168.10.254 192.168.10.1
node san01.funnyguys.biz
node san02.funnyguys.biz
serial /dev/ttyS0
baud 115200
++++++++++++++++++++++++++
/etc/drbd.conf
#
#
global {
minor-count 5;
dialog-refresh 5; # 5 seconds
}
resource lun0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";
startup {
# wfc-timeout 0;
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on san01.funnyguys.biz {
device /dev/drbd0;
disk /dev/sanvg/lun0;
address 192.168.15.160:7789;
meta-disk internal;
}
on san02.funnyguys.biz {
device /dev/drbd0;
disk /dev/sanvg/lun0;
address 192.168.15.162:7789;
meta-disk internal;
}
}
resource lun1 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";
startup {
# wfc-timeout 0;
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on san01.funnyguys.biz {
device /dev/drbd1;
disk /dev/sanvg/lun2;
address 192.168.15.160:7790;
meta-disk internal;
}
on san02.funnyguys.biz {
device /dev/drbd1;
disk /dev/sanvg/lun2;
address 192.168.15.162:7790;
meta-disk internal;
}
}
resource lun2 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";
startup {
# wfc-timeout 0;
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on san01.funnyguys.biz {
device /dev/drbd2;
disk /dev/sanvg/lun1;
address 192.168.15.160:7791;
meta-disk internal;
}
on san02.funnyguys.biz {
device /dev/drbd2;
disk /dev/sanvg/lun1;
address 192.168.15.162:7791;
meta-disk internal;
}
}
resource zim0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";
startup {
# wfc-timeout 0;
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on san01.funnyguys.biz {
device /dev/drbd3;
disk /dev/sysvg/zim0;
address 192.168.15.160:7792;
meta-disk internal;
}
on san02.funnyguys.biz {
device /dev/drbd3;
disk /dev/sysvg/zim0;
address 192.168.15.162:7792;
meta-disk internal;
}
}
--
View this message in context: http://www.nabble.com/Pulled-the-Power-Plug-tf2535391.html#a7063506
Sent from the DRBD - User mailing list archive at Nabble.com.