[DRBD-user] md/lvm bio too big

Wed Oct 22 21:13:10 CEST 2008

Quoting Lars Ellenberg <lars.ellenberg at linbit.com>:

> On Tue, Oct 21, 2008 at 12:44:18PM +0200, David wrote:
>> (problem description)

> md has a bmbv (it does not allow bios crossing stripes).
>
> I expect device-mapper to recognize this,
> and reduce its max_segment_size to 4k when living on top of md.
>
> I'm not sure about what happens with the limits of an lv when
> it has been pv-moved.

Well, it appeared to change the limits mid-run. Not sure if it does it
on the entire LV or just the extents that have been moved on a
per-access basis. At least the first bio too big message came soon
after initiating the pvmove.

> drbd usually communicates its max_segment_size correctly between nodes
> on all occasions we have thought of (attaching a disk while connected,
> re-establishing a connecting when having disks).
> but there may be corner cases which we overlooked.

Yep, that looked like it worked; detaching and reattaching the disk
transmitted the approproate segment size (4096 while on the md stripe,
32k otherwise).

> please correct my understanding.
>
> a)
>  xen vm
>  drbd
>  lvm
>  iscsi (pv)
>
> no problem

Yep

> b)
>   xen vm
>   drbd
>   lvm
>   md (pv)
>   iscsi (md raid0 stripe set)
>
> broken.

Broken while pvmoving. Once wholly moved to the md0 it worked and set
the appropriate 4096 max_segment.

> c)
>   pvmove from md stripe set to plain iscsi,
>   basically a) again: broken.

Yep, back to a) and broken. With the following addition, as I also tried:

drbd detach, lvremove device, lvcreate device (on iscsi PV), drbd  
create-md device, drbd attach device

Syncs fine, says 32K max_segment, but when I restart the xen vm, still broken.

> stop everything, restart it: works again.

Yep, stop drbd completely, unload drbd, vgchange the whole volume  
group off and on for good measure, start drbd and resynced, works again.

>
> is that an accurate description of your observations?
>
> interessting setup.

Mmm, drbd was pre-8.0 when I set up the iSCSI infrastructure. Today
I'd have probably skipped the iSCSI backing storage, used DRBD and
shared out whatever iSCSI I needed from a xen VM instead. But it works
and has its advantages, and as I already have the stuff in place I can  
enjoy those :).

> xen vbd?
> guest kernel even?
> drbd? -> try disconnecting, reconnecting, try detaching, reattaching
>          see what "max_segment_size" is reported in kernel log
> lvm? -> try lvchange -an ; lvchange -ay (or vgchange...)
>
> md? -> unlikely. in my experience a pretty solid driver.
>        if at times with suboptimal performance...

Well, as I recreated the logical volumes (and even did it once with  
changing the lv name, to be sure it was really, really, a completely  
new lv), and as drbd reports the correct underlying segment size on  
reattaches, I suspect the layer complaining is above that. I'm  
uncertain about at what points the segment sizes get transferred  
upwards, and at what point the actual constraints get enforced and the  
errors get triggered tho. As the kernel error indicated the actual  
drbd device, I could imagine it for some reason still thought the drbd  
device in question had the 4096 constraint, altho the layers above  
thought it shouldn't, but frankly I'm far beyond wild speculation at  
that point :).

I can try some more tests later this week(end), and see if I can  
retrigger it and set up a reliable way to replicate the error if you  
can't reproduce it.

Best regards,
David

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.