[DRBD-user] md/lvm bio too big

Tue Oct 21 21:41:02 CEST 2008

On Tue, Oct 21, 2008 at 12:44:18PM +0200, David wrote:
>
> Hi,
>
> I was mucking around with doing a whole lot of performance testing on my 
> drbd setup with various configurations and ran into some trouble in the 
> form of these:
>
> Oct 20 13:25:19 eros kernel: bio too big device drbd36 (48 > 8)
> Oct 20 13:25:19 eros kernel: bio too big device drbd36 (56 > 8)
> Oct 20 13:25:19 eros kernel: bio too big device drbd36 (48 > 8)
> Oct 20 13:25:19 eros kernel: bio too big device drbd36 (24 > 8)
> Oct 20 13:25:19 eros kernel: bio too big device drbd36 (56 > 8)
>
> The basic setup is xen vm's on top of drbd over lvm over iscsi (xen  
> hosts: centos 5.2, drbd-8.0.13, 2.6.18-92.1.10.el5xen, iscsi servers  
> centos 5.2, iscsi enterprise target)
>
> I was testing inserting an (md) raid0 layer between lvm and the iscsi devices,
> and the first error came when doing a pvmove of a drbd backing device
> from an iscsi device to the md stripe. Fine I thought, detached the
> device and moved it while detached instead. Resynched, restarted the
> xen vm and all was fine. (and, I noted max_segment_size got set to 4096)
>
> Then I tested pvmoving the (again detached) device back to the direct
> iscsi device and reattached. Now the bio too big errors would come on
> any access to the device (while max_segment_size got set to 32768).
> Detaching, removing and recreating the backing lv didnt solve the
> issue. Bringing down and up the drbd device (on both nodes) didnt solve 
> it. Eventually I had to shut down drbd completely on the node in  
> question and
> restart it from scratch, which made the device accessible from the xen
> vm again.
>
> I'm not sure exactly what was happening, but it appears as if some
> layer was sticking to the 4096 max_segment_size, while something above
> no longer thought that's the case? It's a bit hard to trace as there
> seems to be no simple ways to get max_segment_sizes out of various
> devices. Obviously, having read up on the whole merge_bvec stuff I
> shouldn't exactly be surprised that a pvmove between two devices may
> result in, er, odd issues, but detach/reattach to devices with
> different configs should work without having to restart drbd, right?

drbd (8.0 and 8.2) sets its max_segment_size
to no more than 32 kB.

if the lower-level device ("disk") of drbd
has a merge_bvec_fn, drbd reduces its max_segment_size to 4k,
and usually also set its max_segments to 1
(but that is an other story)

unless you tell it to use the bio_merge_bvec_fn of the
lower-level-device (use-bmbv), in which case drbd stays with its 32 kB,
but then allows the lower-level-device's bmbv function to reduce that
when necessary. _DO NOT_ use-bmbv unless the stack on both nodes
behaves exactly the same (e.g. both plain scsi devices without further
restrictions), as drbd is not (yet) able to cope with differeing
limitation results of that ll-dev bmbv on both nodes,
which is why we default to 4k single segment when detecting a ll-bmbv.

now.
unfortunately, device-mapper does _ignore_ the lower level devices
bmbv, and does not always stack the limits correctly, at least it does
not reduce the max segment to 1, which it should when ignoring the
ll-dev's bmbv. and it did not use a bmbv itself.
(starting linux 2.6.26 or 2.6.27, device-mapper now honors the bmbv, and
exposes one itself.)

drbd can only see the lv, which does not have a bmbv.
we can only stack the other limits.

md has a bmbv (it does not allow bios crossing stripes).

I expect device-mapper to recognize this,
and reduce its max_segment_size to 4k when living on top of md.

I'm not sure about what happens with the limits of an lv when
it has been pv-moved.

drbd usually communicates its max_segment_size correctly between nodes
on all occasions we have thought of (attaching a disk while connected,
re-establishing a connecting when having disks).
but there may be corner cases which we overlooked.

and, it may very well be possible that the xen vbd driver does not
handle the case when its backing devices change their limitations.

> Any suggestions? Should I just expect it to behave this way/should
> stacking on top of md be avoided/should I try the --use-bmbv option?
> I'm a bit reluctant to test retriggering the situation too much (as  
> restarting the whole drbd on the node in question stops a few things)  
> before I can get some input on wether this can potentially corrupt data, 
> if it is a bug or
> if it was just an anomaly of my config or a result of the pvmove that
> got something stuck.

please correct my understanding.

a)
 xen vm
 drbd
 lvm
 iscsi (pv)

no problem

b)
  xen vm
  drbd
  lvm
  md (pv)
  iscsi (md raid0 stripe set)

broken.

c)
  pvmove from md stripe set to plain iscsi,
  basically a) again: broken.

stop everything, restart it: works again.

is that an accurate description of your observations?

interessting setup.

if you could find out which part of the stack causes the effect,
that would be a great help.

xen vbd?
guest kernel even?
drbd? -> try disconnecting, reconnecting, try detaching, reattaching
         see what "max_segment_size" is reported in kernel log
lvm? -> try lvchange -an ; lvchange -ay (or vgchange...)

md? -> unlikely. in my experience a pretty solid driver.
       if at times with suboptimal performance...

-- 
: Lars Ellenberg                
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed