[DRBD-user] drbd with heartbeat doesnt sync both ways

Fri Sep 15 15:26:45 CEST 2006

Christophe Zwecker wrote:

> node1 is primary with mounted fs
> node2 is secondary
> 
> nod1 goes down (only network failure),

"only" network failure? Which network? In many cases, a network failure 
alone is worse than one box completely failing, because it can cause 
"split brain" if you're not careful.

What connections do you have for Heartbeat to use? (A serial heartbeat 
is always a good idea if you can have it). As many redundant paths as 
possible is good. (typical might be 3: replication (crossover) network 
between the DRBD machines, "normal" network and serial heartbeat)

> heartbeat unmounts the drbd fs on 
> node1. node 2 takes over and mounts the drbd volume. 

And what happens to node1 here? Are you sure that Heartbeat stops the 
DRBD services? My guess is that you have a single network connection for 
both DRBD and Heartbeat, in which case DRBD will still be primary on node1.

> node1 comes backup, mounts drbd volume and the change aint  there because:
> Sep 15 13:47:03 mw-test-n2 kernel: drbd0: Current Primary shall become 
> sync TARGET! Aborting to prevent data corruption.

DRBD is doing the right thing here. Either your nodes weren't really 
synchronised before the failure, or you had a split brain where DRBD was 
primary on both machines.

This situation can only be resolved manually, i.e. by a human telling 
DRBD which machine has the latest data. (something like "drbdadm XXX 
invalidate_remote --do-what-I-say" on the "good" machine)

Tim