[DRBD-user] Connection problem after hardware change

Mon Jul 30 23:58:22 CEST 2007

On Friday 27 July 2007, Lars Ellenberg wrote:
> On Thu, Jul 26, 2007 at 08:52:18PM +0200, Yves Glodt wrote:
> > Hello,
> >
> > I have a system set up with 2 drbd nodes. Today I changed the
> > hardware of the (at that moment) secondary node.
> >
> > After rebooting it, at some point the primary went into
> > "StandAlone", and the new secondary stays at "WFConnection"
> >
> > When I launch "drbdadm connect all" on the primary, it goes
> > to "WfConnection" for one second, and displays this in the log:
>
> is it possible that you direct kernel logs of different severity
> into different files?

I guess no. It's a quite fresh debian etch, and I changed nothing to 
it's logging behaviour.

> [please try to persuade your mail client to not wrap lines!]
>
> and don't name your nodes DRBD#, that will lead to confusion.
>
> > Jul 26 20:45:35 drbd1 kernel: drbd0: drbdsetup [7972]: cstate
> > StandAlone --> Unconnected Jul 26 20:45:35 drbd1 kernel: drbd0:
> > drbd0_receiver [7973]: cstate Unconnected --> WFConnection Jul 26
> > 20:45:37 drbd1 kernel: drbd0: drbd0_receiver [7973]: cstate
> > WFConnection --> WFReportParams Jul 26 20:45:37 drbd1 kernel:
> > drbd0: Handshake successful: DRBD Network Protocol version 74 Jul
> > 26 20:45:37 drbd1 kernel: drbd0: Connection established. Jul 26
> > 20:45:37 drbd1 kernel: drbd0: I am(P):
> > 1:00000010:00000001:00000074:00000010:10 Jul 26 20:45:37 drbd1
> > kernel: drbd0: Peer(S): 1:00000011:00000001:0000005f:0000000f:00
>
> here is a line missing from this log,
> which should read something linke "current primary shall become sync
> target, aborting to prevent data corruption".

for sure I have never seen such a line... :-)

> > Jul 26 20:45:37 drbd1 kernel: drbd0: drbd0_receiver [7973]: cstate
> > WFReportParams --> StandAlone Jul 26 20:45:37 drbd1 kernel: drbd0:
> > asender terminated
> > Jul 26 20:45:37 drbd1 kernel: drbd0: worker terminated
> > Jul 26 20:45:37 drbd1 kernel: drbd0: drbd0_receiver [7973]: cstate
> > StandAlone --> StandAlone Jul 26 20:45:37 drbd1 kernel: drbd0:
> > Connection lost.
> > Jul 26 20:45:37 drbd1 kernel: drbd0: receiver terminated
>
> to get out of this, there is the easy method:
> go to the node which has the bad data,
>  bad node# drbdadm disconnect ; drbdadm invalidate;
> then, on both nodes:
>  both nodes# drbdadm connect
>
> which will give you a connection
> and full sync from "good node" to "bad node".

This worked, and fixed it for me...
Thanks for that hint. I am still quite new to drbd but we will use it 
soon in production... Thanks for your hints.

regards,
yves

> if you really want to avoid the full sync, and you are very sure that
> an incremental sync is sufficient, search the list archives for
> advice to manipulate drbd meta data generation counters using
> "write_gc.pl", or using the drbdmeta command from the drbd8 tarball.
>
> unless you have terabytes of storage, go for the first method.
>
> how you got into this situation is a different question.
> I guess you made a habbit of forcing drbd to become primary,
> "just to get it working".  if you force it, you take the blame.

-- 
Linux 2.6.20-16-generic #2 SMP Thu Jun 7 20:19:32 UTC 2007 i686
 22:38:41 up  3:12,  1 user,  load average: 0.26, 0.21, 0.18