[DRBD-user] DRBD9 disconnect, double primary -> connect again -> Sometimes "initial Split Brain", sometimes not
mailinglists at websitemanagers.com.au
Wed Jan 2 13:16:54 CET 2019
On 31/12/18 01:12, Jochen Kamuf wrote:
> I have two identical bare metal machines "server1" and "server2"
> running DRBD_KERNEL_VERSION=9.0.13 on Opensuse Leap15.
> I have one resource "res1"
> Double Primary is NOT allowed.
> If I run the following scenario:
> 0. "server1" stops the VM that uses the res1 for the virtual hard disk
> (to ensure a consistent vm state of the backup and have no read/writes
> on the drbd resource )
> 1. Set "res1" on both nodes "server1" and "server2" to secondary and
> 2. Set "res1" on both nodes to primary in standalone mode
> 3. "server1" starts the VM that uses the res1 for the virtual hard disk
> 4. "server2" run the command "ddrescue /dev/drbd/by-res/bb
> 5. When the backup is finished I set "res1" on node"server2" to
> secondary ( while on server1 the res1 is already on primary and gets
> reads/writes )
> 6. Connect both nodes again
My suggestion, (which doesn't take into account any of your actual
needs, but takes what you have and adds IMHO a small improvement....):
1. Set "res1" on both nodes "server1" and "server2" to secondary and
2. Set "res1" on "server1" ONLY to primary in standalone mode
3. "server1" starts the VM that uses the res1 for the virtual hard disk
4. "server2" run the command "ddrescue /dev/underlying/device
5. When the backup is finished I set "res1" on node "server2" to connect
( while on server1 the res1 is already on primary and gets reads/writes )
The reason this will work is because DRBD doesn't store any data at the
beginning of the underlying block device, so you can simply use the
restored data as a direct disk image. Also, if you ever needed to, you
could restore the image back to the DRBD device, which would "run out of
space" but all of the disk content would be restored successfully.
ie, dd if=/backup/backup.img of=/dev/drbd/by-res/bb
You will get an out of space when trying to write the DRBD internal
data, but that can be ignored, since DRBD will manage/update it's
internal data anyway (or in reality, you have probably just re-created
the DRBD data as well).
This avoids any potential for split brain.
An additional consideration is to use LVM below DRBD, so at step 3 you
would take a snapshot of the underlying DRBD volume, then you can simply
reconnect DRBD in primary/secondary (step 5), then start the VM (step 3)
and finally do the backup from the snapshot. Obviously, you would remove
the snapshot afterwards.
More information about the drbd-user