Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi According to the documentation drbd should never go into StandAlone mode unless for some serious problems. From the Changelog of drbd the latest issues concerning going into StandAlone mode where fixed in 0.7.12 and we are using version 0.7.14. So my question, in rebooting the primary node of an ha cluster, drbd got into the StandAlone state for the reason as explained below. This is what is happening on the primary: 03:58:10 kernel drbd0: Primary/Secondary --> Secondary/Secondary 03:58:10 kernel drbd0: drbdsetup [4377]: cstate Connected --> Unconnected 03:58:10 kernel drbd0: drbd0_receiver [1529]: cstate Unconnected --> BrokenPipe 03:58:10 kernel drbd0: short read expecting header on sock: r=-512 03:58:10 kernel drbd0: asender terminated 03:58:10 kernel drbd0: worker terminated 03:58:10 kernel drbd0: drbd0_receiver [1529]: cstate BrokenPipe --> StandAlone 03:58:10 kernel drbd0: Connection lost. 03:58:10 kernel drbd0: receiver terminated 03:58:10 kernel drbd0: drbdsetup [4377]: cstate StandAlone --> StandAlone 03:58:10 kernel drbd0: drbdsetup [4377]: cstate StandAlone --> Unconfigured 03:58:10 kernel drbd0: worker terminated 03:58:10 kernel drbd: module cleanup done. And this is what is happening on the secondary: 03:58:27 kernel drbd0: Secondary/Primary --> Secondary/Secondary 03:58:27 kernel drbd0: meta connection shut down by peer. 03:58:27 kernel drbd0: drbd0_asender [1481]: cstate Connected --> NetworkFailure 03:58:27 kernel drbd0: asender terminated 03:58:27 kernel drbd0: drbd0_receiver [1466]: cstate NetworkFailure --> BrokenPipe 03:58:27 kernel drbd0: short read expecting header on sock: r=-512 03:58:27 kernel drbd0: worker terminated 03:58:27 kernel drbd0: drbd0_receiver [1466]: cstate BrokenPipe --> Unconnected 03:58:27 kernel drbd0: Connection lost. 03:58:27 kernel drbd0: drbd0_receiver [1466]: cstate Unconnected --> WFConnection 03:58:28 kernel drbd0: Secondary/Unknown --> Primary/Unknown 03:58:37 kernel EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal 03:58:37 kernel drbd0: Unable to bind source sock (-99) 03:58:37 kernel drbd0: Unable to bind sock2 (-99) 03:58:37 kernel drbd0: drbd0_receiver [1466]: cstate WFConnection --> Unconnected 03:58:37 kernel drbd0: worker terminated 03:58:37 kernel drbd0: drbd0_receiver [1466]: cstate Unconnected --> Unconnected 03:58:37 kernel drbd0: Connection lost. 03:58:37 kernel drbd0: Discarding network configuration. 03:58:37 kernel drbd0: drbd0_receiver [1466]: cstate Unconnected --> StandAlone 03:58:37 kernel drbd0: receiver terminated After further investigating I have found why drbd goes standalone. Somewhere on the server the ip addresses on the interface are temporarily flushed and re-added by a 'ip route flush dev eth0', followed by several 'ip addr add' commands . If at that moment drbd tries to sync a packet it considers this as a serious network error and goes in standalone mode. So, my question, is there a possibility to disable drbd going in standalone mode when it encounters serious network trouble? Regards Wim Ceulemans -- --------------------------------------------------- Event: 10 years Able 10: 1996 - 2006 - We are pleased to invite you, subscribe at http://www.able.be/10 aXs GUARD has completed security and anti-virus checks on this e-mail (http://www.axsguard.com) --------------------------------------------------- Able NV: ond.nr 0457.938.087 RPR Mechelen