Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Feb 20, 2012 at 10:16:50PM +0100, Andreas Bauer wrote: > From: Lars Ellenberg <lars.ellenberg at linbit.com> > Sent: Fri 17-02-2012 11:52 > > > On Tue, Feb 14, 2012 at 12:21:07PM +0100, Andreas Bauer wrote: > > > Yes, makes sense for the md_resync, but what for the KVM process? It does > > > access its device via DRBD, so is the stacktrace incomplete? (missing DRBD > > > layer?). > > > > Do you have an other kvm placed directly on the MD? > > Are you sure your kvm's are not bypassing DRBD? > > The lvm volumes are definately all placed on DRBD (for failover purposes). I did recheck that. > > The root fs of the server is directly on the MD though. So a kvm process would probably only access a logfile there. > > Still cannot really make sense of what happened... > > > > I might raise the issue with the MD developpers but at the moment I am still > > > confused why DRBD did behave like it did. How would DRBD behave when the > > > backing device blocks hard? > > > > With recent enough DRBD, you can configure DRBD to force-detach from a "stuck" > > backing device. This may be dangerous, though. You don't want too aggressive > > settings there. > > > > Dangerous: say you submitted a READ, and then the backing device becomes > > "unresponsive", and DRBD decides (based on your configuration) to force-detach, > > re-try the read from the peer, and finally complete to upper layers. > > Then some time later the backing device "suddenly" springs back to life, > > processes the still queued request, and now DMAs data from disk to ... > > ... where, exactly? > > Right. > > Into some pages which may meanwhile be reused for something completely > > different, or may be unmapped. So in that scenario, best case it triggers > > a pagefault and panic, worst case it silently corrupts unrelated data. > > By the way, I love DRBD and also the level of discussion on this list. > It's great to learn about the architecture of the software I work > with. > > The underlying device should never get stuck in the first place, so it > would be sufficient to handle it manually when it happens. But when I > "force-detach", the DRBD device would change to be readonly correct? Not as long as the peer is still reachable and up-to-date. > I guess a running VM on top of it wouldn't like that. > > Can DRBD 8.3.11 force-detach manually? I think 8.3.12 got that feature. It may still not cover all corner cases. For a manually forced detach while IO is already stuck on the lower level device, you need to "drbdadm detach --force" the first time you try, or the "polite" detach may get stuck itself and prevent the later --force. And you may or may not need an additional "resume-io" to actually get all "hung" IO to be -EIO-ed on that level. They may still be completed OK to upper layers (file system), if the peer was reachable and up-to-date. > I run Primary/Secondary only so if anything goes wrong it will be > thrown into the hands of a human (me) ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com