[Drbd-dev] corrupted resource can't be fixed by rolling back to old snapshot
Michael Labriola
veggiemike at sourceruckus.org
Tue Jul 19 17:35:30 CEST 2022
Hey, all. I posted this to drbd-user a few weeks ago and didn't get
any feedback... Hoping I get a nibble on drbd-dev.
Here's a question that's bugging me. I've had this happen
multiple times now (over the course of 2-3 years, so infrequent).
I've got a system set up with DRBD resources using ZFS volumes as the
block devices (for volume management and snapshots among other
reasons). I've had some obvious hardware problems lead to what I
think is corrupted DRBD metadata a few times. Now, I had expected to
be able to simply rollback to an earlier snapshot of the underlying
ZVOL on the primary, a slightly older one on the secondary node, and
sync back up nicely. But what happens instead is no matter how old of
a snapshot I use, I continue to get these types of errors:
drbdadm dump-md nautilus_data
Found meta data is "unclean", please apply-al first
drbdadm apply-al nautilus_data
extent 4746752 beyond end of bitmap!
extent 4870144 beyond end of bitmap!
extent 5436416 beyond end of bitmap!
extent 5437440 beyond end of bitmap!
...
extent 6793216 beyond end of bitmap!
extent 6793218 beyond end of bitmap!
../shared/drbdmeta.c:2028:apply_al: ASSERT(bm_pos - bm_on_disk_pos <=
chunk - this_extent_size) failed.
What I'm trying to understand is how can I be corrupting my DRBD
resource so badly that going back in time to an older version of the
block device used by the resource is STILL corrupt?
This is an Ubuntu 20.04 system with 5.15 kernel and DRBD 9.1.5, but as
mentioned I've seen this problem a couple times over the years with
5.10 and 5.4 kernels and whatever version of DRBD9 compiled for those
kernels at the time. I'm convinced I must be fundamentally
misunderstanding something about how DRBD works on this one.
My resource config follows:
# resource nautilus_data on skywalker: not ignored, not stacked
# defined at /etc/drbd.d/nautilus_data.res:1
resource nautilus_data {
device /dev/drbd1 minor 1;
meta-disk internal;
on skywalker {
node-id 0;
disk /dev/zdata/nautilus;
address ipv4 10.1.20.201:7810;
}
on vader {
node-id 1;
disk /dev/zdata/nautilus;
address ipv4 10.1.20.202:7810;
}
connection {
host skywalker address ipv4 192.168.1.2:7810;
host vader address ipv4 192.168.1.3:7810;
net {
_name vader;
}
}
net {
protocol C;
max-buffers 36k;
max-epoch-size 20000;
sndbuf-size 2M;
rcvbuf-size 4M;
allow-two-primaries yes;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
disk {
disk-barrier no;
disk-flushes no;
al-extents 3833;
c-plan-ahead 1;
c-fill-target 24M;
c-max-rate 110M;
c-min-rate 10M;
}
}
--
Michael D Labriola
401-316-9844 (cell)
More information about the drbd-dev
mailing list