[DRBD-user] Effect of writing directly to DRBD backing volume while offline?

Sun Jun 1 09:46:27 CEST 2008

On Sat, May 31, 2008 at 06:07:36PM +0000, Simon Carey-Smith wrote:
> I have an LVM partition /dev/vg/lvdata being replicated by DRBD (drbd
> on top of LVM2).  I use external metadata on a separate LVM partition
> of 256 MB.
> 
> I would like to know the effect of mounting and writing to the LVM partition
> directly while DRBD is stopped.  I can't find anything mentioned in the
> documentation about doing this.

why did you do this, and what did you do.

> I ask because when I tried this, after bringing DRBD back up everything seemed
> fine, it replicated the changes to the secondary.  However, 2 days later I

maybe it replicated _some_ changes.
but it cannot replicat "the" changes,
because it was not there to track
them, when you bypassed it.

if you do something like this, you need to trigger a full sync, e.g. by
saying "drbdadm invalidate all" on the node that now has stale data.

> started receiving messages in the system log like the following:

did you failover in between?
you'd then work with an inconsistent dataset.

> kernel: attempt to access beyond end of device
> kernel: drbd0: rw=0, want 14,702,736,608 limit 579,141,632

file system tried to read some bogus block.
generic block layer detected the bogusness, and refused.

> After another few days DRBD crashed and fscking the LVM volume found
> loads of problems (not surprising, as it was trying to write to a
> block value which is about 25 times bigger than the maximum which
> exists!)
>
> 
> So, it is not possible to directly write to the LVM partition without
> screwing up DRBD?  If I needed to do this, would this require running
> "drbdadm create-md r$" afterwards on the primary and secondary?

when you bypass DRBD, it has no way tracking the changes you do.
so, as mentioned earlier, whenever you do that, you need to trigger a
full sync once you decide to use DRBD again.
otherwise, yes, you screwed up.

and why would you need to bypass drbd, in the first place?

> Any ideas on how DRBD could be trying to write to a block which is so
> much out of range?

it did not.

your filesystem requested that block,
and the generic block layer refused,
even before that requests reached DRBD.

aparently you corrupted your filesystem on the way,
possibly by bypassing DRBD, then not syncing
and switching to an inconsistent data set.
or maybe there is something else corrupting your file system,
like broken RAM or such.

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please don't Cc me, but send to list -- I'm subscribed