[Drbd-dev] DRBD8: Receive_state() won't dec_local after a disk
failure on peer.
Ernest.Montrose at stratus.com
Fri Jun 29 23:40:45 CEST 2007
We have been seeing a problem where a cluster of two systems, X and Y.
X is Primary and gets a disk fault. X goes Diskless.
Y now is forced to be Primary.
X recovers from the fault.
But now Y gets a disk fault and goes Diskless but Stay Primary.
At this point I/O from r0 hangs on Y!
A check on /proc/<ip>/wchan for the worker thread reveals that we are
waiting forever for local_cnt to become 0 in after_state_ch(). So the
worker thread will process the Net_read. What happened is that after
the first failure on X, receive_state() on Y failed to call dec_local().
The pdisk received state is Diskless therefore we won't dec_local(). The
included patch illustrates the problem and attempts to fix it.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 710 bytes
Url : http://lists.linbit.com/pipermail/drbd-dev/attachments/20070629/58498fa6/drbd_recv.obj
More information about the drbd-dev