[DRBD-user] DRBD and LVM Snapshot with 2 nodes configuration

Tue Apr 6 23:25:10 CEST 2004

/ 2004-04-06 22:42:26 +0200
\ Andreas Semt:
> >This is perfectly the right place for this kind of questions.
> >
> 
> So I will ask a lot more questions ;-)

Then you should reply to the list!
Though I am typically pretty responsive to DRBD matters,
I sometimes may be just too busy. And there are some experienced
users on the list which should be able to answer most of the
questions...

> >LVM2 because of its filtering capabilities.
> >you need to exclude the lower level devices from beeing scanned.
> 
> Is it a big deal to set up these filters?

just edit the config file.
sorry, from the top of my head, I just don't know the exact syntax
and location now, but it should be pretty obvious.
There should be examples of this in the archive, maybe even on google.

> My idea: I would do the snapshot on the secondary node (nodeB) in the 
> following order (all on nodeB, who is in standby):
>  1) stop heartbeat
>  2) stop all services on /dev/nb* and umount /dev/nb*

ON A STANDBY (DRBD Secondary) node there ain't no services accessing the
drbd devices. NOT AT ALL.

>  3) stop drbd
>  4) do a snapshot of the LVs  associated with the drbd block devices
>  5) mount the snapshot, tar it and save it to a normal (not drbded) 	
>     partition/LV (Question: Can I bring drbd up again even when I've
>     mounted the snapshot (on a non-drbded LV to save the snapshot
>     data)?? I guess: "No", cause LVM could not speak with the underlying
>     fs anymore, right?)
>  6) start drbd (synch nodeA --> nodeB) and mount /dev/nb*
>  7) start all services on /dev/nb*

if this was / is your secondary, there ain't no services to start.

>  8) start heartbeat

and, if you use heartbeat (or any cluster manager), it is NOT your
buisiness to start them anyways, it is heartbeats business.

so, you have some bad thinkos in this procedure...

[A]
  after 3), you cannot be sure to have a *clean* device, as defined
  in my previous mail, since nobody told the active file system
  (which is accesing the DRBD Primary device on the other node) to
  "flush and suspend" prior to that.

for 5):
  you can "bring up" (tell the fs to resume) any user immediately
  after you took the snapshot. LVM does NOT access any "underlying fs",
  the FS is accessing the LV.
  creating a snapshot basically sets up a COW mapping for the LV in
  question. so each access to the LV after you took the snapshot
  will first copy the existing block (if not already copied) to the
  snapshot, and then continue (I think they do it this way, and not
  the other way 'round, but I'd need to look it up to be sure).

> With this approach I could do a nice snapshot without stopping the 
> services on nodeA.

see [A]

	Lars Ellenberg