[DRBD-user] DRBD corruption with kmod-drbd90-9.1.8-1
Brent Jensen
jeneral9 at gmail.com
Tue Aug 16 21:05:32 CEST 2022
Issue at elrepo already reported:
https://elrepo.org/bugs/view.php?id=1250
Brent
On 8/16/2022 11:30 AM, Brent Jensen wrote:
> I just had my second DRBD cluster fail after updating
> kmod-drbd90-9.1.8-1 and then upgrading the kernel. I'm not sure if the
> kernel update broke things or if it was because it caused after the
> reboot. About 2 weeks ago there was an update (kmod-drbd90-9.1.8-1)
> from elrepo, which got applied. But then after a kernel update the
> DRBD meta data was corrupt. Here's the gist of the error:
>
> This is using alma-linux 8:
>
> Aug 7 16:41:13 nfs6 kernel: drbd r0: Starting worker thread (from
> drbdsetup [3515])
> Aug 7 16:41:13 nfs6 kernel: drbd r0 nfs5: Starting sender thread
> (from drbdsetup [3519])
> Aug 7 16:41:13 nfs6 kernel: drbd r0/0 drbd0: meta-data IO uses: blk-bio
> Aug 7 16:41:13 nfs6 kernel: attempt to access beyond end of
> device#012sdb1: rw=6144, want=31250710528, limit=31250706432
> Aug 7 16:41:13 nfs6 kernel: drbd r0/0 drbd0:
> drbd_md_sync_page_io(,31250710520s,READ) failed with error -5
> Aug 7 16:41:13 nfs6 kernel: drbd r0/0 drbd0: Error while reading
> metadata.
>
> This is from a centos 7 cluster:
> Aug 16 11:04:57 v4 kernel: drbd r0 v3: Starting sender thread (from
> drbdsetup [9486])
> Aug 16 11:04:57 v4 kernel: drbd r0/0 drbd0: meta-data IO uses: blk-bio
> Aug 16 11:04:57 v4 kernel: attempt to access beyond end of device
> Aug 16 11:04:57 v4 kernel: sdb1: rw=1072, want=3905945600,
> limit=3905943552
> Aug 16 11:04:57 v4 kernel: drbd r0/0 drbd0:
> drbd_md_sync_page_io(,3905945592s,READ) failed with error -5
> Aug 16 11:04:57 v4 kernel: drbd r0/0 drbd0: Error while reading metadata.
> Aug 16 11:04:57 v4 drbd(drbd0)[9452]: ERROR: r0: Called drbdadm -c
> /etc/drbd.conf -v adjust r0
> Aug 16 11:04:57 v4 drbd(drbd0)[9452]: ERROR: r0: Exit code 1
> Aug 16 11:04:57 v4 drbd(drbd0)[9452]: ERROR: r0: Command output:
> drbdsetup new-peer r0 0 --_name=v3 --fencing=resource-only
> --protocol=C#012drbdsetup new-path r0 0 ipv4:10.1.4.82:7788
> ipv4:10.1.4.81:7788#012drbdmeta 0 v09 /dev/sdb1 internal
> apply-al#012drbdsetup attach 0 /dev/sdb1 /dev/sdb1 internal
> Aug 16 11:04:57 v4 drbd(drbd0)[9452]: ERROR: r0: Command stderr: 0:
> Failure: (118) IO error(s) occurred during initial access to
> meta-data.#012#012additional info from kernel:#012Error while reading
> metadata.#012#012Command 'drbdsetup attach 0 /dev/sdb1 /dev/sdb1
> internal' terminated with exit code 10
>
> Both clusters have been running flawlessly for ~2 years. I was in
> process of building a new DRBD custer to offload the first one when
> the 2nd production cluster had a kernel update and ran into the same
> exact issue. On the first cluster (rhel8/alma) I deleted the metadata
> and tried to resync the data over; however, it failed with the same
> issue. I'm in processes of building a new one to fix that broken DRBD
> cluster. In the last 15 years of using DRBD I have never run into any
> corruption issues. I'm at a loss; I thought the first one was a fluke;
> now I know it's not!
>
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user at lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user
More information about the drbd-user
mailing list