[DRBD-user] Possible DRBD Desync After Outage - Why?

Wed Jul 8 09:29:22 CEST 2009

On Tue, Jul 07, 2009 at 09:08:43AM -0700, Mike Sweetser - Adhost wrote:
> Hello:
> 
> We have two DRBD machines running RHEL 5.3 with DRBD 8.3.0.  Recently,
> we had an outage that took the primary server in the cluster down,
> leaving it to failover using DRBD and Heartbeat.  This was done with
> no issues.   

> Assuming all this was done right, we ran into other issues - some
> people have complained that their files have "reverted" to a previous
> state.

how long ago is "previous"?

several days?
a few hours?

if only a few seconds, people forgot to fsync, and
the block device has never seen that particular write.
or there are volatile caches involved, and you pretended to DRBD
they had been non-volatile.

> We don't show any errors occuring in the synchronization of
> the files, and never saw any "oos" in the DRBD status.  

There is only one way to "jump back in time" with DRBD:

You swichover to a node that has been disconnected for some time,
and you did not notice. Then you go online with that stale data.

So I assume IFF you really jumped back in time,
that happened during your failover,
because for some reason the nodes have not been replicating.

You need to fix your setup, monitor the system, and probably add
resource level fencing (outdate a disconnected secondary if a primary is
still running).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed