[DRBD-user] DRBD stuck after a strong network failure

Cyril Bouthors cyril at bouthors.org
Tue Apr 18 22:55:41 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 18 Apr 2006, Lars Ellenberg wrote:

>> Apr 17 22:26:44 nfsa4 kernel: drbd0: Connection lost.
>> Apr 17 22:26:44 nfsa4 kernel: drbd0: drbd0_receiver [831]: cstate Unconnected --> WFConnection
>> [NETWORK WENT BACK HERE]
>> Apr 17 22:27:20 nfsa4 kernel: drbd0: drbd0_receiver [831]: cstate WFConnection --> WFReportParams
>
> no further log messages?

No. Nothing concerning DRBD between those lines, here's the full log:

Apr 17 22:27:20 nfsa4 kernel: drbd0: drbd0_receiver [831]: cstate WFConnection --> WFReportParams
Apr 17 22:45:09 nfsa4 kernel: tts/0: 1 input overrun(s)
Apr 17 22:56:13 nfsa4 kernel: nfsd: last server has exited
Apr 17 22:56:13 nfsa4 kernel: nfsd: unexporting all filesystems
Apr 17 22:56:15 nfsa4 kernel: drbd0: Primary/Unknown --> Secondary/Unknown

> so it seems to be stuck in WFReportParams ...
> doh.
> that _should_ timeout pretty quick... couple of seconds, iirc.

It has been stuck for ~10 minutes, then I got SSH access, tried few
things rapidly took the decision to reboot the servers.

It should not be stuck in WFReportParams.

> any hanging drbdsetup processes?

I had no time to take notes but our heartbeat resource runs drbdadm,
not drbdsetup.

> output of /proc/drbd at that time, in case you have a session/console log?

Nothing left, sorry.

> did you try to "drbdadm disconnect"?

No.

> ok, here at least the first WFReportParams indeed timed out.
> does not help on the primary, unfortunately...

:(
-- 
Cyril Bouthors
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20060418/88e7ee42/attachment.pgp>


More information about the drbd-user mailing list