[DRBD-user] Response to Mr. Ellenberg's answer to: "Warning: If using drbd; data-loss / corruption is possible; [...]"

Wed Aug 16 19:05:41 CEST 2017

Dear Mr. Ellenberg,

I thank you for the profound explanation of the issues of my settings.

Ok. at the moment, the scope where I am using this configuration is either
experimental, and I use it as a replacement of transferring date by "rsync"
(with "rsync", I had to wait about 6h till my data has been synchronized,
with drbd at about 20 min to 1h; so 'drbd is much better' anyway.).

REPLACEMENT FOR "rsync":

What I do on the local drbd-device is:
* replacing some memory-mapped sectors (on the drbd-device) of some
  virtual-disk-files by "rsync" with the actual data from 'origin-virtual-
  disk-files' (writing ends after a finite time);
* then 'on the peer drbd-device' I wait principally by
  "drbdsetup wait-sync-resource [...]" for achieving synchronisation with
  the local drbd-device;
* then I take a 'dm-thin-provisioned-target'-copy of the 'base'-device
  ("/media/byUuid/EF1C0E32-3CB0-11DB-B6E3-0000C00A45A9.N1.V0.BASE") of the
  peer drbd-device (I do the waiting very thoroughly [but I do not want to
  bother you with details]);
* then I start this copy as a standalone-drbd-device (with different minor)
  and so I have a valid copy of my virtual disks on 'the peer-computer'.

So I think it does what I expect.

Yes I noticed this quick state changes: what you mentioned with:
'
Still, "constantly" cycling between
Connected
While not idle for long enough
	Ahead/Behind,
	SyncSource/SyncTarget
is a bad idea.
'
But, I thought software and cpus are not worn (ev. your nerves? [sorry]),
so why bother?

EXPERIMENTAL

I have been concious, that ev. I would never reach a synchronized "UpToDate"
state of 'my' drbd-devices. But anyway

* As soon, as I am no longer expecting data-loss, I am going to put the
  virtual-disk-file of a running virtual-machine on the local drbd-device.
* 'We' do not much writing to this virtual-disk-file (about 200M / day),
  so I expect, that ev. sometimes the data on the peer device would be
  consistent.
* I just want to observe what happens by logging through appropriate
  scripts for "after-resync-target" and "before-resync-target".
* I am anyway concious, that the virtual-disk-file of the running virtual
  machine is itself inconsistent during the virtual-disk-file is mounted
  'as read/write-filesytem' in the virtual machine.
* So I want to shutdown the virtual machine, wait for synchronization on
  the peer-device and then taking a copy as described above.

==========================================================================

Because you advise me against using the drbd-device as I intended, I have
to discuss it with my boss if we at all ...; so I allow myself, to cc.
this mail to my boss (blind carbon copy), and attach "your response to my
former letter" to this mail.

I thank you once again for your immediate answer,

sincerely

Thomas Bruecker

===========================================================================
On Wed Aug 16 12:10:48 CEST 2017 +? Lars Ellenberg
<lars.ellenberg at linbit.com> wrote: [answer taken from the drbd-user mailing
list; 'answer' --> '>' ]

On Mon, Aug 14, 2017 at 10:09:06PM +0200, "Thomas Brücker" wrote:
>> Dear DRBD-Developers, dear DRBD-Users,
>>
>> Actually I would be very fond of DRBD -- But unfortunately I had
>> sometimes data-losses (rarely, but I had them).
>>
>> FOR DEVELOPERS AND USERS:
>>
>> DRBD-Versions concerned: 9.0.7-1, 9.0.8-1, 9.0.9rc1-1 . "THE VERSIONS"
>>
>> I think the following configuration options are mandatory to have these
>> data losses:
>> net {
>>     congestion-fill  "1";    # 1 sector
>>     on-congestion    "pull-ahead";
>>     protocol         "A";
>>     [... (other options)]
>> }
>> (the goal of these settings: a very slow network-connection should not
>>  slow down the local disk-io.)

> While that is a commendable goal, even without bugs,
> this does not do what you apparently think it does.

> "pull ahead" is an option that is really only useful
> when using the DRBD proxy, the buffered ("in flight") data
> will be several 100 MB to several GB, congestion-fill would be
> ~ 80% (or more) of that buffer, and it would take seconds to minutes
> to drain the already queued buffer before changing to resync
> and then to normal replication.

> Even then, the "pull ahead" is considered an emergency break only,
> and certainly not something that is supposed to happen often.

> Your configuration basically tells DRBD to "pull ahead" for *each*
> write request, then "immediately" start a resync, while the next
> write-request already jumps to "ahead" again.

> Does not make sense, and probably DRBD should just refuse such
> configuration.  You are using it "out of spec", basically,
> and it is very plausible that you hit some bugs when doing so.

> That being said, even then DRBD should, once idle, eventually reach a
> point where all replicas are identical again.

> If you care for two-node scenarios only,
> DRBD 8.4 may or may not behave better with pull-ahead,
> but the comment above still applies, about "ahead" mode being intended,
> and being only really useful, in conjunction with DRBD proxy.

>> * Supposed Explanation:

> Thank you.

>> I am longing for a perfectly working DRBD,

> Don't we all.

> Still, it would not do what you apparently think it would.

> "pulling ahead" means that we don't send the date over anymore,
> but only the "LBA numbers" of changed blocks when they change first.
> And that, once the "congestion" is considered to be over,
> we start a resync.

> Which means the peer becomes sync target.

> If you pull ahead "very frequently",
> you keep your peer between "behind" and "sync target",
> it won't really have the chance to actually catch up.

> A sync target is (necessarily, by design) Inconsistent.
> Inconsistent means you have a mix of old and new blocks.
> Inconsistent data is unusable.

> If you "catastrophically" lose your main data copy,
> and you are left with an only inconsistent remote copy,
> because the peer constantly changed between "behind" and "sync target",
> you still need to find your latest consistent backup.

> DRBD has the "before resync-target" handler to at least try to
> "snapshot" the latest consistent version of the data before becoming
> inconsistent to mitigate that.

> Still, "constantly" cycling between
> Connected
> While not idle for long enough
>	Ahead/Behind,
>	SyncSource/SyncTarget
> is a bad idea.

> If you want snapshot shipping,
> use a system designed for snapshot shipping.
> DRBD is not.

> --
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker

> DRBD® and LINBIT® are registered trademarks of LINBIT
> __
> please don't Cc me, but send to list -- I'm subscribed