[DRBD-user] gfs hang over drbd 0.8pre4

Fri Aug 18 10:54:27 CEST 2006

Hi there,

I randomly come across system hangs when copying relatively large amount
of data in a gfs area over drbd 0.8pre4. System is a two-node cluster with
two primaries, running kernel 2.6.15.7 with Ubuntu cluster patchset
(including gfs/dlm etc) and Debian Sarge. The actual data is typically
separated into small files, somewhat similar to the kernel source. I
experienced that system hang usually happens after copying about 2-2.5
GB of data. Then (not always) one of the nodes hangs, and comes back to
life after hw reset only. On the other node this message comes to the
kernel log:

Aug 18 09:48:26 imind-front2 kernel: drbd0: [drbd0_worker/3920]
sock_sendmsg time expired, ko = 4294967295
Aug 18 09:48:29 imind-front2 kernel: drbd0: [drbd0_worker/3920]
sock_sendmsg time expired, ko = 4294967294
Aug 18 09:48:29 imind-front2 kernel: drbd0: PingAck did not arrive in
time.
Aug 18 09:48:29 imind-front2 kernel: drbd0: peer( Primary -> Unknown )
conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Aug 18 09:48:29 imind-front2 kernel: drbd0: Creating new current UUID
Aug 18 09:48:29 imind-front2 kernel: drbd0: asender terminated
Aug 18 09:48:29 imind-front2 kernel: drbd0: conn( NetworkFailure ->
BrokenPipe )

Please, help me to work around this sytem hang issue, if possible.

Thanks,
Balint