[DRBD-user] Connection problem after hardware change

Lars Ellenberg lars.ellenberg at linbit.com
Fri Jul 27 10:43:34 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Jul 26, 2007 at 08:52:18PM +0200, Yves Glodt wrote:
> Hello,
> 
> I have a system set up with 2 drbd nodes. Today I changed the hardware 
> of the (at that moment) secondary node.
> 
> After rebooting it, at some point the primary went into "StandAlone", 
> and the new secondary stays at "WFConnection"
> 
> When I launch "drbdadm connect all" on the primary, it goes 
> to "WfConnection" for one second, and displays this in the log:

is it possible that you direct kernel logs of different severity
into different files?

[please try to persuade your mail client to not wrap lines!]

and don't name your nodes DRBD#, that will lead to confusion.
> Jul 26 20:45:35 drbd1 kernel: drbd0: drbdsetup [7972]: cstate StandAlone --> Unconnected
> Jul 26 20:45:35 drbd1 kernel: drbd0: drbd0_receiver [7973]: cstate Unconnected --> WFConnection
> Jul 26 20:45:37 drbd1 kernel: drbd0: drbd0_receiver [7973]: cstate WFConnection --> WFReportParams
> Jul 26 20:45:37 drbd1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
> Jul 26 20:45:37 drbd1 kernel: drbd0: Connection established.
> Jul 26 20:45:37 drbd1 kernel: drbd0: I am(P): 1:00000010:00000001:00000074:00000010:10
> Jul 26 20:45:37 drbd1 kernel: drbd0: Peer(S): 1:00000011:00000001:0000005f:0000000f:00

here is a line missing from this log,
which should read something linke "current primary shall become sync
target, aborting to prevent data corruption".

> Jul 26 20:45:37 drbd1 kernel: drbd0: drbd0_receiver [7973]: cstate WFReportParams --> StandAlone
> Jul 26 20:45:37 drbd1 kernel: drbd0: asender terminated
> Jul 26 20:45:37 drbd1 kernel: drbd0: worker terminated
> Jul 26 20:45:37 drbd1 kernel: drbd0: drbd0_receiver [7973]: cstate StandAlone --> StandAlone
> Jul 26 20:45:37 drbd1 kernel: drbd0: Connection lost.
> Jul 26 20:45:37 drbd1 kernel: drbd0: receiver terminated


to get out of this, there is the easy method:
go to the node which has the bad data,
 bad node# drbdadm disconnect ; drbdadm invalidate;
then, on both nodes:
 both nodes# drbdadm connect 

which will give you a connection
and full sync from "good node" to "bad node".

if you really want to avoid the full sync, and you are very sure that an
incremental sync is sufficient, search the list archives for advice
to manipulate drbd meta data generation counters using "write_gc.pl",
or using the drbdmeta command from the drbd8 tarball.

unless you have terabytes of storage, go for the first method.

how you got into this situation is a different question.
I guess you made a habbit of forcing drbd to become primary,
"just to get it working".  if you force it, you take the blame.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list