On Tue, Mar 11, 2008 at 12:41:00PM -0400, Charles Riley wrote:
> Hi,
> I have a drbd cluster which has experienced the following chain of 
> events (DRBD .7.24):
> All network interfaces on the primary node were restarted, to include 
> the DRBD interface.
> Heartbeat at this point lost connectivity to it's ping nodes, and failed 
> over to the secondary node.
> During the failover to the secondary node, the secondary node's DRBD was 
> set from Secondary/Unknown --> Primary Unknown and all services were 
> failed over.  So far so good.
> However, the secondary node's drbd was stuck in WFConnection.  This went 
> unnoticed.
> The system ran for a period of time, and then was failed back to the 
> primary node, causing us to jump back in time to the date of the initial 
> failover as far as the filesystem was concerned.
> When it was noticed that the filesystem was out of date, the data on it 
> was restored from external sources.
> Now I am brought in to the middle of this, and find that  the primary 
> node (which now has good data) is 0: cs:StandAlone st:Primary/Unknown 
> ld:Consistent
> The secondary is 0: cs:WFConnection st:Secondary/Unknown ld:Consistent
> My question is, what is going to happen when I reestablish communication 
> between primary and secondary?
> Is drbd going to do the right thing and bring the secondary node's disk 
> up to date?
> I'd just do a "drbdadm -- --do-what-I-say primary all" on the primary, 
> but I'm pretty sure that once the failover node's drbd starts 
> communicating again, a sync is going to start.
> I just want to be prepared (and coordinate downtime) if I am going to 
> have to recover again from external sources.

well, --do-what-I-say does not do what you mean. again. :)
because it _is_ already primary, that would be a no-op.

I suggest you do
bad# cat /proc/drbd
 0: cs:WFConnection st:Secondary/Unknown ld:Consistent
bad# drbdadm disconnect all ; cat /proc/drbd
 0: cs:StandAlone st:Secondary/Unknown ld:Consistent
bad# drbdadm invalidate all ; cat /proc/drbd
 0: cs:StandAlone st:Secondary/Unknown ld:Inconsistent
bad# drbdadm adjust all ; cat /proc/drbd
 0: cs:WFConnection st:Secondary/Unknown ld:Inconsistent
good# cat /proc/drbd
 0: cs:StandAlone st:Primary/Unknown  ld:Consistent
good# drbdadm adjust all ; sleep 15 ; cat /proc/drbd
 0: cs:SyncSource st:Primary/Secondary  ld:Consistent

