[DRBD-user] [REQ] heartbeat auto_failback support

Mon Sep 20 14:43:04 CEST 2004

/ 2004-09-20 13:06:25 +0100
\ Steve Purkis:
> Hi list,
> 
> This is a feature request.
> 
> I've been trying to setup drbd + heartbeat with auto_failback ON.  I've 
> found that it doesn't work; presumably that's a known issue as on 
> fail-back syslog reports:
> 
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: drbd0_receiver [9099]: cstate 
> WFConnection --> WFReportParams
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: Handshake successful: DRBD 
> Network Protocol version 74
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: Connection established.
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: I am(P): 
> 1:00000002:00000001:00000013:00000009:10
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: Peer(S): 
> 1:00000002:00000001:00000014:00000008:00
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: Current Primary shall become 
> sync TARGET! Aborting to prevent data corruption.
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: drbd0_receiver [9099]: cstate 
> WFReportParams --> StandAlone
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: error receiving ReportParams, 
> l: 72!
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: asender terminated
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: worker terminated
> 	Sep 20 12:44:47 p4test2 kernel: drbd0: drbd0_receiver [9099]: cstate 
> StandAlone --> StandAlone
> 
> I can get everything working with auto_failback OFF, but I'd prefer to 
> have it on (for architectural reasons - we are considering failing over 
> to a remote server, which means a slower connection).
> 
> So I'm wondering if there are any plans to include support for this?  
> AFAICS, that would involve:
> 
>     *	allow a resync from a more recent secondary to a primary.
> 	(in this case, some reads would *have* to be remote,
> 	 or could prioritize synched blocks as they are requested?)
> 
> Maybe it's out there...  but there are cases where it would be useful.

what you can do is:
have a secondary that currently is sync target, and promote that to primary.
then you have a Primary being sync target.  no problem.

what you can NOT do is: have a Primary happily running along and being
used, and then suddenly decide that what it thought to be valid data is
no longer valid, and you want to change the data underneath it.
this is similar to have a database running on some file system on top of
/dev/sda, and then doing dd if=/dev/zero of=/dev/sda. we cannot allow
this. it just does not make sense.

so what I like to know is what exact failure scenario leads to
the log above. what is the sequence of events, and please start
with "both nodes happily humming along", do not skip right into
"... and then it does not work any longer" as you did above.

I suspect it is some configuration mistake on your side.

	Lars Ellenberg

-- 
please use the "List-Reply" function of your email client.