[DRBD-user] Re: drbd error -5 and lvm thoughts and observations

Sun Apr 8 20:22:18 CEST 2007

>What I would have liked to have happened
>
>. Hard disk failed
>. drbd noticed and released the drbd device making the other node primary

In this case, that's not what should happen because a hard drive 
failure on a RAID that is not RAID 0 shouldn't impact your ability to 
use the RAID. That's the point of RAID, at least with all RAID levels 
other than 0.

So in that sense, it appears this error:

>Apr  6 12:11:11 data2 kernel: drbd5: Local IO failed. Detaching...

is erroneous. However, it seems to have occurred because somehow the 
3ware driver misbehaved. That is, if the 3ware driver merely adds in 
the system log that a drive has failed, drbd has no reason to take 
action, so it really sounds like the 3ware driver has a bug and *it* 
caused the  I/O to fail, not any drive in the array (once again, 
assuming a RAID level other than 0).

>After these errors were reported I was unable to deallocate drbd5:
>device or shutdown the drbd processes other than by rebooting.  The
>device was still seen as the primary on the other node in the cluster
>and and would not failover to the secondary member.

Was it really the case that it didn't detach? I'm not sure that it 
changes the primary/secondary status because the underlying storage 
fails (or in this case, falsely claiming to fail). It should, 
however, use the "secondary" for all disk I/O.
-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University