[DRBD-user] Re: Effect of writing directly to DRBD backing volume while offline?

Sun Jun 1 11:58:55 CEST 2008

> On Sat, May 31, 2008 at 06:07:36PM +0000, Simon Carey-Smith wrote:
> > I have an LVM partition /dev/vg/lvdata being replicated by DRBD (drbd
> > on top of LVM2).  I use external metadata on a separate LVM partition
> > of 256 MB.
> > 
> > I would like to know the effect of mounting and writing to the LVM partition
> > directly while DRBD is stopped.  I can't find anything mentioned in the
> > documentation about doing this.
> 
> why did you do this, and what did you do.

Thanks for your comments Lars.  I was trying to ask two questions in one post,
so the details weren't clear.  Here's the actual scenario which occured:

DRBD state on primary node became diskless after server maintenance and reboot
(as its backing LVM parition was mounted automatically by the filesystem)...
(A)  diskless/?       (B)  secondary/uptodate

Several hours later, after unmounting the LVM partition on node-A we ran
"drbdadm attach $r" followed by "drbdadm -- -o primary $r"
The state became...
(A)  primary/uptodate       (B)  secondary/uptodate

Here it appears that the secondary replicated to the primary, as we lost the
changes on node-A over those few hours. 
>From what I understand, we should have run "drbdadm invalidate $r" on node-B
before reconnecting the resources?
How does this differ from running 'drbdadm --over-write-data-of-peer $r" on
node-A ?  Or running "drbdadm -- --discard-my-data connect $r" on node-B?

> 
> > kernel: attempt to access beyond end of device
> > kernel: drbd0: rw=0, want 14,702,736,608 limit 579,141,632
> 
> file system tried to read some bogus block.
> generic block layer detected the bogusness, and refused.

Where does the filesystem get it's info on where to read/write blocks?  Is in in
a journal stored on the volume itself?  Does this mean that the volume got
corrupted somehow after the scenario above?

> 
> > After another few days DRBD crashed and fscking the LVM volume found
> > loads of problems (not surprising, as it was trying to write to a
> > block value which is about 25 times bigger than the maximum which
> > exists!)
> >
> > 
> > So, it is not possible to directly write to the LVM partition without
> > screwing up DRBD?  If I needed to do this, would this require running
> > "drbdadm create-md r$" afterwards on the primary and secondary?
> 
> when you bypass DRBD, it has no way tracking the changes you do.
> so, as mentioned earlier, whenever you do that, you need to trigger a
> full sync once you decide to use DRBD again.
> otherwise, yes, you screwed up.

Which is done by simply running "drbdadm invalidate $r" on the stale node right.

> 
> and why would you need to bypass drbd, in the first place?

In this case, it wasn't by intention (backing LVM device was mounted by error).
 But we have also wanted to do this from time to time because we have DRBD
tightly integrated with Heartbeat and occasionally have wanted to switch off
Heartbeat while doing new installations.

> 
> > Any ideas on how DRBD could be trying to write to a block which is so
> > much out of range?
> 
> it did not.
> 
> your filesystem requested that block,
> and the generic block layer refused,
> even before that requests reached DRBD.
> 
> aparently you corrupted your filesystem on the way,
> possibly by bypassing DRBD, then not syncing
> and switching to an inconsistent data set.
> or maybe there is something else corrupting your file system,
> like broken RAM or such.
> 

Makes sense. I'll check the RAM just in case.

It would be great if there were a few more tips in the User Gude ('Chapter 7.
Troubleshooting and error recovery') for recovering from various scenarios.
Appropriate situations to use the various commands like,
"drbdadm invalidate $r"
"drbdadm --over-write-data-of-peer $r"
"drbdadm -- --discard-my-data connect $r"

Cheers!