[DRBD-user] Kernel hung on DRBD / MD RAID

Andreas Bauer ab at voltage.de
Mon Feb 20 22:16:50 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


From:	Lars Ellenberg <lars.ellenberg at linbit.com>
Sent:	Fri 17-02-2012 11:52

> On Tue, Feb 14, 2012 at 12:21:07PM +0100, Andreas Bauer wrote:
> > Yes, makes sense for the md_resync, but what for the KVM process? It does
> > access its device via DRBD, so is the stacktrace incomplete? (missing DRBD
> > layer?).
> 
> Do you have an other kvm placed directly on the MD?
> Are you sure your kvm's are not bypassing DRBD?

The lvm volumes are definately all placed on DRBD (for failover purposes). I did recheck that.

The root fs of the server is directly on the MD though. So a kvm process would probably only access a logfile there.

Still cannot really make sense of what happened...

> > I might raise the issue with the MD developpers but at the moment I am still
> > confused why DRBD did behave like it did. How would DRBD behave when the
> > backing device blocks hard?
> 
> With recent enough DRBD, you can configure DRBD to force-detach from a "stuck"
> backing device. This may be dangerous, though. You don't want too aggressive
> settings there.
> 
> Dangerous: say you submitted a READ, and then the backing device becomes
> "unresponsive", and DRBD decides (based on your configuration) to force-detach,
> re-try the read from the peer, and finally complete to upper layers.
> Then some time later the backing device "suddenly" springs back to life,
> processes the still queued request, and now DMAs data from disk to ...
> ... where, exactly?
> Right.
> Into some pages which may meanwhile be reused for something completely
> different, or may be unmapped. So in that scenario, best case it triggers
> a pagefault and panic, worst case it silently corrupts unrelated data.

By the way, I love DRBD and also the level of discussion on this list. It's great to learn about the  architecture of the software I work with.

The underlying device should never get stuck in the first place, so it would be sufficient to handle it manually when it happens. But when I "force-detach", the DRBD device would change to be readonly correct? I guess a running VM on top of it wouldn't like that.

Can DRBD 8.3.11 force-detach manually?

I run Primary/Secondary only so if anything goes wrong it will be thrown into the hands of a human (me) ;-)

regards,

Andreas



More information about the drbd-user mailing list