[DRBD-user] Dual-primary + OCFS2 resync lost datas

Thu Apr 8 08:05:16 CEST 2010

On Wed, Apr 07, 2010 at 03:24:04AM +0200, Guillaume Chanaud wrote:
> Hi everybody,
> 
> i have two node with drbd in dual-primary mode. On top of this there
> is an OCFS2 fs.
> I export this whole thing with NFS.
> Everything was working fine, i could write on one or another node
> and access one or the other with nfs without any problems.
> 
> Few days ago (in fact more than few, it was 10 days ago), one node
> failed. The node was in inconsistency state but the second node
> continued to work read/write. I'm not sure but maybe that even on
> the failed node, some nfs clients continued to write.
> 
> Today i put back the failed node, drbd resync, and both node were
> promoted primary automatically. In fact i had nothing to do.
> 
> BUT, i lost all datas which was written between the failure and the
> resync. It's like the good node resynced with the failed node and so
> reverted 10 days ago (i thought the failed node would resync with
> the good node and got all datas written the last 10 days) ! That
> means i lost 10 days of data which were here just before the resync.
> Is this a normal behavior ???

No.

> Is this possible to get back those datas (i think no...)

Yes.  From your backups.
You do have backups, don't you.

RAID is not replication.
Neither of them is a substitute for backups.

> So any clues on this would be welcome,

versions of the stuff involved,
drbd configuration, logs, cluster manager,
"custom helper scripts" if any...

"It went catastrophically wrong. Tell me why."
is not that much to go on ;)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed