[DRBD-user] Re: drbd error -5 and lvm thoughts and observations

Wed Apr 11 07:18:40 CEST 2007

>So if I use the panic option the kernel would therefore crash forcing a
>failover to occur?

Would you really want that? Drbd is functioning as RAID 1 and when a 
"disk" fails, the other "disk" should take over and that's what drbd 
is doing. The users can continue working as if nothing had happened. 
Then when there is an off hour, you can manually force a failover.

>Is there no other way for heartbeat to monitor the status of the drbd
>devices?

I'm not sure it's heartbeat's responsibility. You could roll your own 
logging. For my system, I actually check /proc/drbd for "Diskless" 
periodically.

>Basically I ended up with a cluster hang.
>

I'm not 100% certain how that came about. You tried to kill drbd 
while the primary was still using it. What I suggest is to wait for 
an off hour and then manually stop heartbeat on the primary. That 
should cause the heartbeat on the other system to take over cleanly.

Regardless, I think that your real problem is the misbehaving 3ware 
card. RAID cards should *never* send SCSI errors up the I/O stack 
unless there is a multiple, simultaneous disk failure.
-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University