[DRBD-user] Kernel hung on DRBD / MD RAID

Lars Ellenberg lars.ellenberg at linbit.com
Fri Feb 17 11:50:52 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Feb 14, 2012 at 12:21:07PM +0100, Andreas Bauer wrote:
> From:	Lars Ellenberg <lars.ellenberg at linbit.com>
> 
> > Ok, corrected to 8.3.11:
> > > > Kernel 3.1.0 / DRBD 8.3.11
> > 
> > Nothing directly DRBD related in the stack traces.
> 
> Yes, makes sense for the md_resync, but what for the KVM process? It does
> access its device via DRBD, so is the stacktrace incomplete? (missing DRBD
> layer?).

Do you have an other kvm placed directly on the MD?
Are you sure your kvm's are not bypassing DRBD?

> > But you have one kvm in:
> > kernel: [2009644.546925]  [<ffffffffa00939d1>] ?  wait_barrier+0x87/0xc0 [raid1]
> > and md1_resync in 
> > kernel: [2009644.547433]  [<ffffffffa0093917>] ?  raise_barrier+0x11a/0x14d 
> > [raid1]
> > 
> > Looks like MD is stepping on it's own toes there.
> 
> Stepping on one's own toes is something one isn't supposed to do, right? ;-)
> 
> I might raise the issue with the MD developpers but at the moment I am still
> confused why DRBD did behave like it did. How would DRBD behave when the
> backing device blocks hard?

With recent enough DRBD, you can configure DRBD to force-detach from a "stuck"
backing device. This may be dangerous, though. You don't want too aggressive
settings there.

Dangerous: say you submitted a READ, and then the backing device becomes
"unresponsive", and DRBD decides (based on your configuration) to force-detach,
re-try the read from the peer, and finally complete to upper layers.
Then some time later the backing device "suddenly" springs back to life,
processes the still queued request, and now DMAs data from disk to ...
... where, exactly?
Right.
Into some pages which may meanwhile be reused for something completely
different, or may be unmapped. So in that scenario, best case it triggers
a pagefault and panic, worst case it silently corrupts unrelated data.

So if you intend to use that feature, maybe you rather want 

> regards,
> 
> Andreas
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list