[DRBD-user] Zombie, Zombie?

Andreas Semt as at computer-leipzig.de
Mon Apr 26 13:37:04 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars Ellenberg wrote:

> please reply to the list.
> 
> / 2004-04-26 12:17:52 +0200
> \ Andreas Semt:
> 
>>>>I use DRBD 0.6.12 (with heartbeat 1.21). Sometimes I get some real high 
>>>>load on my machine (load average around 6) and a zombie process like 
>>>>drbd_syncer_2 or drbd_syncer_3. I believe the zombie process causes the 
>>>>high load. Also i lost a drbd connection without reason (one connection 
>>>>of four).
>>>>Additional information: the zombie processes disappear after some time 
>>>>and the load is normal again. Zombies exist only on the drbd "PRIMARY" 
>>>>machine (all drbd devices in primary mode).
>>>>
>>>>My questions:
>>>>
>>>>1. Could it be a zombie process who killed the drbd connection?
>>>
>>>
>>>No. It is only killed *after* the connection was lost.
>>>Unfortunately it is not always reaped immediately.
>>>Zombie processes cannot cause load, because they are dead.
>>>They only waste some memory and process slots.
>>>
>>
>>Okay, the high load is explained in the FAQ (sorry).
>>Can you say what exactly drbd_syncer and drbd_asender doing?
> 
> 
> 
> syncer reads in the blocks that need to be synced from local disk,
> and sends over to the peer.
> 
> asender sends ACKknowledgement packets for written data, and for
> drbd-pings, and itself sends drbd-ping-requests if neccessary.
> 
> 
>>>>2. Why these Zombies live on my machine?
>>>
>>>If you care, you can use 0.6.12 CVS HEAD, which differs basically only
>>>by a "reparent to init" call right after our thread startup, as it
>>>should have been from the very begining. now, the "zombies" are reaped
>>>by init almost immediately after their death.
>>>
>>
>>Is that a common behavior of drbd (to create "zombies") or a bug or 
>>something else?
> 
> 
> Not a bug, as long as they get reaped eventually.
> Unusual bahaviour, if it takes "long" to reap them.
> A bug if they stay around for ever.
> 
> Unusual in general, because typically kernel threads should be
> reparented to init anyways.
> "historically" drbd wanted to reap its child threads itself.
> 

Another question: I have a directory on a partition on top of a drbd 
device. The load average was very high for that machine (around 6).
When i tried to do a "ls -l" in that directory, nothing happens, the "ls 
-l" command hangs. However there was no drbd traffic at all on the drbd 
device for the specific partition. Could it be that some drbd process 
was responsible for the "hang"? How can I detect which process access 
the drbd device at a particular time?

-- 
Best regards,
Andreas Semt



More information about the drbd-user mailing list