[DRBD-user] DRBD Freezing

Brian E. Dunbar bdunbar at dunbarconsulting.org
Mon Feb 5 19:05:46 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I am having a problem with DRBD occasionally freezing on me. Before I give
to much information let me hit one topic that may be the root cause.

The most relevant piece of setup information may be that the disk space that
drbd is using for storage is located on a loopback file system created with
losetup. I mention this because I have since found some posts on this list
and other drbd web pages that this setup may in fact be the problem (but
there did not seem to be a conclusive answer), even though some the patches
in the 0.7.X tree seem to address this very situation. Can anyone tell me
whether using the loopback file systems is indeed a problem?

If you don't think that is the problem here is some more information. We had
this freezing problem very frequently when we first installed drbd with a
0.6.X version (what came on Ubuntu). I found references to fixes in the
0.7.X tree for deadlocking and upgraded and that seemed to solve the problem
(ran for about 3 months without seeing it again). Until this weekend, when
DRBD froze again.

 Before DRBD froze, the following showed up in the messages file:

 1 Time(s): [49453867.890000] drbd0: PingAck did not arrive in time.
 1 Time(s): [49453867.890000] drbd0: asender terminated
 1 Time(s): [49453867.890000] drbd0: drbd0_asender [7394]: cstate
Connected --> NetworkFailure
 1 Time(s): [49453867.890000] drbd0: drbd0_receiver [5859]: cstate
BrokenPipe --> Unconnected
 1 Time(s): [49453867.890000] drbd0: drbd0_receiver [5859]: cstate
NetworkFailure --> BrokenPipe
 1 Time(s): [49453867.890000] drbd0: short read expecting header on sock:
r=-512
 1 Time(s): [49453867.890000] drbd0: worker terminated
 1 Time(s): [49453867.900000] drbd0: Connection lost.
 1 Time(s): [49453867.900000] drbd0: drbd0_receiver [5859]: cstate
Unconnected --> WFConnection
 1 Time(s): [49453903.490000] drbd0: Connection established.
 1 Time(s): [49453903.490000] drbd0: Handshake successful: DRBD Network
Protocol version 74
 1 Time(s): [49453903.490000] drbd0: drbd0_receiver [5859]: cstate
WFConnection --> WFReportParams
 1 Time(s): [49453903.500000] drbd0: I am(S):
1:00000002:00000003:00000793:00000003:01
 1 Time(s): [49453903.500000] drbd0: Peer(P):
1:00000002:00000003:00000794:00000003:10
 1 Time(s): [49453903.500000] drbd0: Secondary/Unknown --> Secondary/Primary
 1 Time(s): [49453903.500000] drbd0: drbd0_receiver [5859]: cstate
WFReportParams --> WFBitMapT
 1 Time(s): [49453903.580000] drbd0: Resync started as SyncTarget (need to
sync 1924 KB [481 bits set]).
 1 Time(s): [49453903.580000] drbd0: drbd0_receiver [5859]: cstate
WFBitMapT --> SyncTarget
 1 Time(s): [49453903.630000] drbd0: Resync done (total 1 sec; paused 0 sec;
1924 K/sec)
 1 Time(s): [49453903.630000] drbd0: drbd0_worker [7965]: cstate
SyncTarget --> Connected
 1 Time(s): [49458988.150000] drbd0: Connection lost.
 1 Time(s): [49458988.150000] drbd0: PingAck did not arrive in time.
 1 Time(s): [49458988.150000] drbd0: asender terminated
 1 Time(s): [49458988.150000] drbd0: drbd0_asender [7966]: cstate
Connected --> NetworkFailure
 1 Time(s): [49458988.150000] drbd0: drbd0_receiver [5859]: cstate
BrokenPipe --> Unconnected
 1 Time(s): [49458988.150000] drbd0: drbd0_receiver [5859]: cstate
NetworkFailure --> BrokenPipe
 1 Time(s): [49458988.150000] drbd0: drbd0_receiver [5859]: cstate
Unconnected --> WFConnection
 1 Time(s): [49458988.150000] drbd0: short read expecting header on sock:
r=-512
 1 Time(s): [49458988.150000] drbd0: worker terminated
 1 Time(s): [49458999.050000] drbd0: Connection established.
 1 Time(s): [49458999.050000] drbd0: Handshake successful: DRBD Network
Protocol version 74
 1 Time(s): [49458999.050000] drbd0: I am(S):
1:00000002:00000003:00000794:00000003:01
 1 Time(s): [49458999.050000] drbd0: Peer(P):
1:00000002:00000003:00000795:00000003:10
 1 Time(s): [49458999.050000] drbd0: Secondary/Unknown --> Secondary/Primary
 1 Time(s): [49458999.050000] drbd0: drbd0_receiver [5859]: cstate
WFConnection --> WFReportParams
 1 Time(s): [49458999.050000] drbd0: drbd0_receiver [5859]: cstate
WFReportParams --> WFBitMapT
 1 Time(s): [49458999.110000] drbd0: Resync started as SyncTarget (need to
sync 1080 KB [270 bits set]).
 1 Time(s): [49458999.110000] drbd0: drbd0_receiver [5859]: cstate
WFBitMapT --> SyncTarget
 1 Time(s): [49458999.190000] drbd0: Resync done (total 1 sec; paused 0 sec;
1080 K/sec)
 1 Time(s): [49458999.190000] drbd0: drbd0_worker [8670]: cstate
SyncTarget --> Connected

Any ideas where I should start to figuring this out?

Thanks,

Brian





More information about the drbd-user mailing list