[DRBD-user] DRBD on top of mdraid troubles

Fri Mar 17 16:53:27 CET 2023

On 3/17/23 03:50, Roland Kammerer wrote:
> On Wed, Mar 15, 2023 at 03:16:20PM +0200, Athanasios Chatziathanassiou wrote:
>> drbd raid10_ssd/0 drbd1: Local IO failed in drbd_endio_write_sec_final.
>> Detaching...
> I'd say you have a hardware problem on the backing device. Whenever DRBD
> tries to write there local IO fails and then it detaches. So test and
> verify that the backing device/storage actually works.

I have ruled a hardware problem out in my case. My raid10 backing device 
works perfectly with the kernel module from 9.1.4. The kernel module 
from 9.1.5 through 9.1.13 fail with:

Mar 1 08:43:39 cnode2 kernel: md/raid10:md127: make_request bug: can't 
convert block across chunks or bigger than 256k 448794880 132
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( 
UpToDate -> Failed )
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: Local IO 
failed in drbd_request_endio. Detaching...
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: local READ 
IO error sector 29362432+264 on ffff9fcff9a389c0
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: sending 
new current UUID: 9C66E258C0F9F361
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( 
Failed -> Diskless )

This appears to be the same problem as in issue #26, or at least related.

Note that this could still be a mdraid bug, however the same raid10 
works perfectly well with the DRBD 9.1.4 kmod.

Also note that the DRBD device starts up OK and resync works as long as 
both hosts are secondary. Promoting either host to primary seems to 
trigger the error on the host having the raid10 backing device..