Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I am having a problem with DRBD occasionally freezing on me. Before I give to much information let me hit one topic that may be the root cause. The most relevant piece of setup information may be that the disk space that drbd is using for storage is located on a loopback file system created with losetup. I mention this because I have since found some posts on this list and other drbd web pages that this setup may in fact be the problem (but there did not seem to be a conclusive answer), even though some the patches in the 0.7.X tree seem to address this very situation. Can anyone tell me whether using the loopback file systems is indeed a problem? If you don't think that is the problem here is some more information. We had this freezing problem very frequently when we first installed drbd with a 0.6.X version (what came on Ubuntu). I found references to fixes in the 0.7.X tree for deadlocking and upgraded and that seemed to solve the problem (ran for about 3 months without seeing it again). Until this weekend, when DRBD froze again. Before DRBD froze, the following showed up in the messages file: 1 Time(s): [49453867.890000] drbd0: PingAck did not arrive in time. 1 Time(s): [49453867.890000] drbd0: asender terminated 1 Time(s): [49453867.890000] drbd0: drbd0_asender [7394]: cstate Connected --> NetworkFailure 1 Time(s): [49453867.890000] drbd0: drbd0_receiver [5859]: cstate BrokenPipe --> Unconnected 1 Time(s): [49453867.890000] drbd0: drbd0_receiver [5859]: cstate NetworkFailure --> BrokenPipe 1 Time(s): [49453867.890000] drbd0: short read expecting header on sock: r=-512 1 Time(s): [49453867.890000] drbd0: worker terminated 1 Time(s): [49453867.900000] drbd0: Connection lost. 1 Time(s): [49453867.900000] drbd0: drbd0_receiver [5859]: cstate Unconnected --> WFConnection 1 Time(s): [49453903.490000] drbd0: Connection established. 1 Time(s): [49453903.490000] drbd0: Handshake successful: DRBD Network Protocol version 74 1 Time(s): [49453903.490000] drbd0: drbd0_receiver [5859]: cstate WFConnection --> WFReportParams 1 Time(s): [49453903.500000] drbd0: I am(S): 1:00000002:00000003:00000793:00000003:01 1 Time(s): [49453903.500000] drbd0: Peer(P): 1:00000002:00000003:00000794:00000003:10 1 Time(s): [49453903.500000] drbd0: Secondary/Unknown --> Secondary/Primary 1 Time(s): [49453903.500000] drbd0: drbd0_receiver [5859]: cstate WFReportParams --> WFBitMapT 1 Time(s): [49453903.580000] drbd0: Resync started as SyncTarget (need to sync 1924 KB [481 bits set]). 1 Time(s): [49453903.580000] drbd0: drbd0_receiver [5859]: cstate WFBitMapT --> SyncTarget 1 Time(s): [49453903.630000] drbd0: Resync done (total 1 sec; paused 0 sec; 1924 K/sec) 1 Time(s): [49453903.630000] drbd0: drbd0_worker [7965]: cstate SyncTarget --> Connected 1 Time(s): [49458988.150000] drbd0: Connection lost. 1 Time(s): [49458988.150000] drbd0: PingAck did not arrive in time. 1 Time(s): [49458988.150000] drbd0: asender terminated 1 Time(s): [49458988.150000] drbd0: drbd0_asender [7966]: cstate Connected --> NetworkFailure 1 Time(s): [49458988.150000] drbd0: drbd0_receiver [5859]: cstate BrokenPipe --> Unconnected 1 Time(s): [49458988.150000] drbd0: drbd0_receiver [5859]: cstate NetworkFailure --> BrokenPipe 1 Time(s): [49458988.150000] drbd0: drbd0_receiver [5859]: cstate Unconnected --> WFConnection 1 Time(s): [49458988.150000] drbd0: short read expecting header on sock: r=-512 1 Time(s): [49458988.150000] drbd0: worker terminated 1 Time(s): [49458999.050000] drbd0: Connection established. 1 Time(s): [49458999.050000] drbd0: Handshake successful: DRBD Network Protocol version 74 1 Time(s): [49458999.050000] drbd0: I am(S): 1:00000002:00000003:00000794:00000003:01 1 Time(s): [49458999.050000] drbd0: Peer(P): 1:00000002:00000003:00000795:00000003:10 1 Time(s): [49458999.050000] drbd0: Secondary/Unknown --> Secondary/Primary 1 Time(s): [49458999.050000] drbd0: drbd0_receiver [5859]: cstate WFConnection --> WFReportParams 1 Time(s): [49458999.050000] drbd0: drbd0_receiver [5859]: cstate WFReportParams --> WFBitMapT 1 Time(s): [49458999.110000] drbd0: Resync started as SyncTarget (need to sync 1080 KB [270 bits set]). 1 Time(s): [49458999.110000] drbd0: drbd0_receiver [5859]: cstate WFBitMapT --> SyncTarget 1 Time(s): [49458999.190000] drbd0: Resync done (total 1 sec; paused 0 sec; 1080 K/sec) 1 Time(s): [49458999.190000] drbd0: drbd0_worker [8670]: cstate SyncTarget --> Connected Any ideas where I should start to figuring this out? Thanks, Brian