[DRBD-user] Fwd: Kernel Oops on peer when removing LVM snapshot

Mon Jun 22 12:31:18 CEST 2015

(Forwarding to list --- sorry!)

Hi Robert; thanks for answering! Yes, I considered that possibility myself.
However, as this is a single-primary resource, the DRBD block device isn't
available to the LVM layer on the secondary side. The VG is not visible
until the host becomes primary for that resource (at which point the VG and
LVs appear automatically), and the LVM layer holds the resource open until
the VG is deactivated. So I'm pretty confident that the metadata changes
are completely invisible to the LVM layer. I'm certainly not seeing
anything in the kernel logs to do with LVM during these oopses. I agree
that the fact that it's an LVM operation that triggers this does suggest
that there's something going on with LVM metadata here, but I think it must
be some kind of bug, rather than misconfiguration.

I know that DRBD on LVM is not an unusual use-case, so if the changing LVM
metadata is the issue, I'd expect this issue to be well-documented by now.
But it seems not to be, which makes me think it's an unexpected issue. If I
were running in dual-primary mode, of course, I'd need to set up clustered
LVM to lock the metadata correctly. Maybe I need to do that anyway, but I
don't know why...

Paul

On 22 June 2015 at 10:41, Robert Altnoeder <robert.altnoeder at linbit.com>
wrote:

>  If I did not misunderstand what this is about, then the problem seem to
> be this:
>
> You are using a DRBD device as the physical volume for a volume group. As
> soon as something changes in that volume group, e.g. you add or remove
> volumes (such as snapshots), the metadata for that volume group on the
> physical volume changes.
> That is what you replicate to the peer (the secondary), so all the LVM on
> the peer can see, is data magically changing on its physical volume, and
> that's where the kernel Oops is coming from, because data is not supposed
> to change without the local node knowing about it. This is an unsafe
> scenario unless there is some kind of synchronization in place on the LVM
> level (e.g. "Clustered LVM" aka CLVM -- instead of the normal LVM, which is
> not designed to operate on shared or replicated storage).
>
> br,
> Robert
>
> On 06/22/2015 11:06 AM, Paul Gideon Dann wrote:
>
>  So no ideas concerning this, then? I've seen the same thing happen on
> another resource, now. Actually, it doesn't need to be a snapshot: removing
> any logical volume causes the oops. It doesn't happen for every resource,
> though.
>
> [...snip...]
>
>  Paul
>
> On 16 June 2015 at 11:51, Paul Gideon Dann <pdgiddie at gmail.com> wrote:
>
>>  This is an interesting (though frustrating) issue that I've run into
>> with DRBD+LVM, and having finally exhausted everything I can think of or
>> find myself, I'm hoping the mailing list might be able to offer some help!
>>
>>  My setup involves DRBD resources that are backed by LVM LVs, and then
>> formatted as PVs themselves, each forming its own VG.
>>
>>  System VG -> Backing LV -> DRBD -> Resource VG -> Resource LVs
>>
>>  The problem I'm having happens only for one DRBD resource, and not for
>> any of the others. This is what I do:
>>
>>  I create a snapshot of the Resource LV (meaning that the snapshot will
>> also be replicated via DRBD), and everything is fine. However, when I
>> *remove* the snapshot, the *secondary* peer oopses immediately:
>>
>> [...snip...]
>>
>>  Cheers,
>>  Paul
>>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150622/448c22b2/attachment.htm>