Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
>What I would have liked to have happened > >. Hard disk failed >. drbd noticed and released the drbd device making the other node primary In this case, that's not what should happen because a hard drive failure on a RAID that is not RAID 0 shouldn't impact your ability to use the RAID. That's the point of RAID, at least with all RAID levels other than 0. So in that sense, it appears this error: >Apr 6 12:11:11 data2 kernel: drbd5: Local IO failed. Detaching... is erroneous. However, it seems to have occurred because somehow the 3ware driver misbehaved. That is, if the 3ware driver merely adds in the system log that a drive has failed, drbd has no reason to take action, so it really sounds like the 3ware driver has a bug and *it* caused the I/O to fail, not any drive in the array (once again, assuming a RAID level other than 0). >After these errors were reported I was unable to deallocate drbd5: >device or shutdown the drbd processes other than by rebooting. The >device was still seen as the primary on the other node in the cluster >and and would not failover to the secondary member. Was it really the case that it didn't detach? I'm not sure that it changes the primary/secondary status because the underlying storage fails (or in this case, falsely claiming to fail). It should, however, use the "secondary" for all disk I/O. -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University