Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello one and all, this is my first post to this listing, after reading and observing what's happening here, i've been testing drbd for the past 1 month, after reading lots about it for about 3 months. anyway, one of my BruteForce test was pulling teh plug out the wall. and this is the only failure i have come across since testing, can be my config is bad, or something i'm doing in the recovery. that's where i ask for the expertise of this list. i'll describe the probem, then i'll list my configs at the end [drbd.conf, ha.cf, haresources] what i've tested so far.. The Crime Scene: =========== Before the crime is commited: ----------------------------------------- Server A: acts is primary for drbd0, drbd1 and drbd2 and secondary for drbd3 resources [lun0, lun1, lun2] Server B: acts is primary for drbd3 and secondary for drbd0, drbd1 and drbd2resources [zim0] The crime is commited: -------------------------------- * Shutdown heartbeat gracefully [server B]: DRBD3 Resource survived the the troumer and re-appear on Server A * Heartbeat takeover [server A]: DRBD3 Resource survived the the troumer and re-appear on Server A * Shutdown server [server B]: DRBD3 Resource survived the the troumer and re-appear on Server A [shutdown -h now or reboot] * Powerdown server [server B]: DRBD3 Resource survived the the troumer and re-appear on Server A [push the power button - linux will execute shutdown] ** Pull the plug [server B]: DRBD3 Resource was brutely murdered, and never survived [brute force - no more electricity] The evidence: -------------------- DRBD3 Resource does not fail over to Server A [due to the next issues, the intire heartbeat resource don't start] DRBD3 Resource goes in Secondary Mode on server A. Server B is restarted, DRBD Service gives it's warning at startup, and after a few seconds it connects with Server A, and heartbeat still didn't pull over the DRBD3 Resource. DRBD3 resource now shows Secondary on both Server A and Server B CSI at work: ----------------- Once server B is up and running the following was done on Server B: "drbdadm -- --do-what-I-say primary zim0" <-- the DRBD3 resource in question. "drbdadm -- connect zim0" <-- the DRBD3 resource in question. The Conclusion: --------------------- Once that was done, "cat /proc/drbd" [server B] indicated that DRBD3 is primary/seconday, looks good so we tried this: [root at san01 etc]# mount -t jfs /dev/drbd3 /opt/zimbra and we got this : mount: wrong fs type, bad option, bad superblock on /dev/drbd3, or too many mounted file systems (believe me the fs type was correct) Now, what's so nice about this test, is we did the same producedure 3 times, and all 3 times we ended with the same problem. the easy part was this: "mkfs.jfs /dev/drbd3" are you sure? YES and guess what, we were up and running again - no if no buts - (lucky for us we are still beta testing -:) Some how i have the strong belief that this is not the way DRBD was programmed to function, cause formatting the device during a power outage is not what every sysadmin wants to do.. So i hearby ask if someone knows exactly have to be done, to solve this little mishap. Field Notes: --------------- /etc/ha.d/haresources san01.funnyguys.biz 192.168.10.121 drbddisk::lun0 Filesystem::/dev/drbd0::/san/lun0::jfs drbddisk::lun1 Filesystem::/dev/drbd1::/san/lun1::jfs drbddisk::lun2 Filesystem::/dev/drbd2::/san/lun2::jfs smb san02.funnyguys.biz 192.168.10.122 drbddisk::zim0 Filesystem::/dev/drbd3::/opt/zimbra::jfs ++++++++++++++++++++++++++ /etc/ha.d/ha.cf debugfile /var/log/hadebug logfile /var/log/halog logfacility local0 udpport 694 keepalive 2 deadtime 30 warntime 10 initdead 120 bcast eth0 auto_failback off watchdog /dev/watchdog respawn hacluster /usr/lib/heartbeat/ipfail ping 192.168.10.254 192.168.10.1 node san01.funnyguys.biz node san02.funnyguys.biz serial /dev/ttyS0 baud 115200 ++++++++++++++++++++++++++ /etc/drbd.conf # # global { minor-count 5; dialog-refresh 5; # 5 seconds } resource lun0 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { # wfc-timeout 0; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 10M; group 1; al-extents 257; } on san01.funnyguys.biz { device /dev/drbd0; disk /dev/sanvg/lun0; address 192.168.15.160:7789; meta-disk internal; } on san02.funnyguys.biz { device /dev/drbd0; disk /dev/sanvg/lun0; address 192.168.15.162:7789; meta-disk internal; } } resource lun1 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { # wfc-timeout 0; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 10M; group 1; al-extents 257; } on san01.funnyguys.biz { device /dev/drbd1; disk /dev/sanvg/lun2; address 192.168.15.160:7790; meta-disk internal; } on san02.funnyguys.biz { device /dev/drbd1; disk /dev/sanvg/lun2; address 192.168.15.162:7790; meta-disk internal; } } resource lun2 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { # wfc-timeout 0; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 10M; group 1; al-extents 257; } on san01.funnyguys.biz { device /dev/drbd2; disk /dev/sanvg/lun1; address 192.168.15.160:7791; meta-disk internal; } on san02.funnyguys.biz { device /dev/drbd2; disk /dev/sanvg/lun1; address 192.168.15.162:7791; meta-disk internal; } } resource zim0 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { # wfc-timeout 0; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 10M; group 1; al-extents 257; } on san01.funnyguys.biz { device /dev/drbd3; disk /dev/sysvg/zim0; address 192.168.15.160:7792; meta-disk internal; } on san02.funnyguys.biz { device /dev/drbd3; disk /dev/sysvg/zim0; address 192.168.15.162:7792; meta-disk internal; } } -- View this message in context: http://www.nabble.com/Pulled-the-Power-Plug-tf2535391.html#a7063506 Sent from the DRBD - User mailing list archive at Nabble.com.