Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
>So if I use the panic option the kernel would therefore crash forcing a >failover to occur? Would you really want that? Drbd is functioning as RAID 1 and when a "disk" fails, the other "disk" should take over and that's what drbd is doing. The users can continue working as if nothing had happened. Then when there is an off hour, you can manually force a failover. >Is there no other way for heartbeat to monitor the status of the drbd >devices? I'm not sure it's heartbeat's responsibility. You could roll your own logging. For my system, I actually check /proc/drbd for "Diskless" periodically. >Basically I ended up with a cluster hang. > I'm not 100% certain how that came about. You tried to kill drbd while the primary was still using it. What I suggest is to wait for an off hour and then manually stop heartbeat on the primary. That should cause the heartbeat on the other system to take over cleanly. Regardless, I think that your real problem is the misbehaving 3ware card. RAID cards should *never* send SCSI errors up the I/O stack unless there is a multiple, simultaneous disk failure. -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University