[DRBD-user] Zombie, Zombie?

Mon Apr 26 14:49:21 CEST 2004

/ 2004-04-26 13:37:04 +0200
\ Andreas Semt:
> Another question: I have a directory on a partition on top of a drbd 
> device. The load average was very high for that machine (around 6).
> When i tried to do a "ls -l" in that directory, nothing happens, the "ls 
> -l" command hangs. However there was no drbd traffic at all on the drbd 
> device for the specific partition. Could it be that some drbd process 
> was responsible for the "hang"? How can I detect which process access 
> the drbd device at a particular time?

in normal operation, with drbd 0.6,  you should have
drbd_receiver, drbd_asender on both nodes.
when sync is in progress, you have additionally the drbd_syncer.

in 0.7, you have regardless of sync, on both nodes:
drbd_receiver, drbd_asender, drbd_worker.

if one of them seems "dead", or is missing completely, thats not good.
if they are all there, it might be a network failure, and drbd did not
yet notice, or has still some hope for the peer to come back.
it should eventually drop the connection, typically after one ping
timeout, or if ping still gets answered in time, after ko_count * ping
timeout, if not one data packet comes through.

while DRBD still tries to contact its peer, and has not yet concluded
that the peer won't answer anymore, but the peer already is
unresponsive, IO on the drbd device appears to "hang", until either DRBD
drops the connection, or the peer recovers in time.

	Lars Ellenberg