[DRBD-user] DRBD corruption with kmod-drbd90-9.1.8-1
Josh Fisher
jfisher at jaybus.com
Wed Dec 21 16:54:21 CET 2022
There is already a bug report with Linbit/drbd on github. Issue #26
titled "Bug in drbd 9.1.5 on CentOS 7 #26" from Feb. 2022. I added an
update to that issue noting that it persists in 9.1.12 and giving device
info.
On 8/19/22 04:14, Christoph Böhmwalder wrote:
> Am 16.08.22 um 20:30 schrieb Brent Jensen:
>> I just had my second DRBD cluster fail after updating
>> kmod-drbd90-9.1.8-1 and then upgrading the kernel. I'm not sure if the
>> kernel update broke things or if it was because it caused after the
>> reboot. About 2 weeks ago there was an update (kmod-drbd90-9.1.8-1) from
>> elrepo, which got applied. But then after a kernel update the DRBD meta
>> data was corrupt. Here's the gist of the error:
>>
>> This is using alma-linux 8:
>>
>> Aug 7 16:41:13 nfs6 kernel: drbd r0: Starting worker thread (from
>> drbdsetup [3515])
>> Aug 7 16:41:13 nfs6 kernel: drbd r0 nfs5: Starting sender thread (from
>> drbdsetup [3519])
>> Aug 7 16:41:13 nfs6 kernel: drbd r0/0 drbd0: meta-data IO uses: blk-bio
>> Aug 7 16:41:13 nfs6 kernel: attempt to access beyond end of
>> device#012sdb1: rw=6144, want=31250710528, limit=31250706432
>> Aug 7 16:41:13 nfs6 kernel: drbd r0/0 drbd0:
>> drbd_md_sync_page_io(,31250710520s,READ) failed with error -5
>> Aug 7 16:41:13 nfs6 kernel: drbd r0/0 drbd0: Error while reading metadata.
>>
>> This is from a centos 7 cluster:
>> Aug 16 11:04:57 v4 kernel: drbd r0 v3: Starting sender thread (from
>> drbdsetup [9486])
>> Aug 16 11:04:57 v4 kernel: drbd r0/0 drbd0: meta-data IO uses: blk-bio
>> Aug 16 11:04:57 v4 kernel: attempt to access beyond end of device
>> Aug 16 11:04:57 v4 kernel: sdb1: rw=1072, want=3905945600, limit=3905943552
>> Aug 16 11:04:57 v4 kernel: drbd r0/0 drbd0:
>> drbd_md_sync_page_io(,3905945592s,READ) failed with error -5
>> Aug 16 11:04:57 v4 kernel: drbd r0/0 drbd0: Error while reading metadata.
>> Aug 16 11:04:57 v4 drbd(drbd0)[9452]: ERROR: r0: Called drbdadm -c
>> /etc/drbd.conf -v adjust r0
>> Aug 16 11:04:57 v4 drbd(drbd0)[9452]: ERROR: r0: Exit code 1
>> Aug 16 11:04:57 v4 drbd(drbd0)[9452]: ERROR: r0: Command output:
>> drbdsetup new-peer r0 0 --_name=v3 --fencing=resource-only
>> --protocol=C#012drbdsetup new-path r0 0 ipv4:10.1.4.82:7788
>> ipv4:10.1.4.81:7788#012drbdmeta 0 v09 /dev/sdb1 internal
>> apply-al#012drbdsetup attach 0 /dev/sdb1 /dev/sdb1 internal
>> Aug 16 11:04:57 v4 drbd(drbd0)[9452]: ERROR: r0: Command stderr: 0:
>> Failure: (118) IO error(s) occurred during initial access to
>> meta-data.#012#012additional info from kernel:#012Error while reading
>> metadata.#012#012Command 'drbdsetup attach 0 /dev/sdb1 /dev/sdb1
>> internal' terminated with exit code 10
>>
>> Both clusters have been running flawlessly for ~2 years. I was in
>> process of building a new DRBD custer to offload the first one when the
>> 2nd production cluster had a kernel update and ran into the same exact
>> issue. On the first cluster (rhel8/alma) I deleted the metadata and
>> tried to resync the data over; however, it failed with the same issue.
>> I'm in processes of building a new one to fix that broken DRBD cluster.
>> In the last 15 years of using DRBD I have never run into any corruption
>> issues. I'm at a loss; I thought the first one was a fluke; now I know
>> it's not!
>>
>> _______________________________________________
>> Star us on GITHUB: https://github.com/LINBIT
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> https://lists.linbit.com/mailman/listinfo/drbd-user
> Hello,
>
> thank you for the report.
>
> We have implemented a fix for this[0] which will be released soon (i.e.
> very likely within the next week).
>
> If you easily can (and if this is a non-production system), it would be
> great if you could build DRBD from that commit and verify that the fix
> resolves the issue for you.
>
> If not, the obvious workaround is to stay on 9.1.7 for now (or downgrade).
>
> [0]
> https://github.com/LINBIT/drbd/commit/d7d76aad2b95dee098d6052567aa15d1342b1bc4
>
More information about the drbd-user
mailing list