[DRBD-user] md/lvm bio too big

Tue Oct 21 12:44:18 CEST 2008

Hi,

I was mucking around with doing a whole lot of performance testing on  
my drbd setup with various configurations and ran into some trouble in  
the form of these:

Oct 20 13:25:19 eros kernel: bio too big device drbd36 (48 > 8)
Oct 20 13:25:19 eros kernel: bio too big device drbd36 (56 > 8)
Oct 20 13:25:19 eros kernel: bio too big device drbd36 (48 > 8)
Oct 20 13:25:19 eros kernel: bio too big device drbd36 (24 > 8)
Oct 20 13:25:19 eros kernel: bio too big device drbd36 (56 > 8)

The basic setup is xen vm's on top of drbd over lvm over iscsi (xen  
hosts: centos 5.2, drbd-8.0.13, 2.6.18-92.1.10.el5xen, iscsi servers  
centos 5.2, iscsi enterprise target)

I was testing inserting an (md) raid0 layer between lvm and the iscsi devices,
and the first error came when doing a pvmove of a drbd backing device
from an iscsi device to the md stripe. Fine I thought, detached the
device and moved it while detached instead. Resynched, restarted the
xen vm and all was fine. (and, I noted max_segment_size got set to 4096)

Then I tested pvmoving the (again detached) device back to the direct
iscsi device and reattached. Now the bio too big errors would come on
any access to the device (while max_segment_size got set to 32768).
Detaching, removing and recreating the backing lv didnt solve the
issue. Bringing down and up the drbd device (on both nodes) didnt  
solve it. Eventually I had to shut down drbd completely on the node in  
question and
restart it from scratch, which made the device accessible from the xen
vm again.

I'm not sure exactly what was happening, but it appears as if some
layer was sticking to the 4096 max_segment_size, while something above
no longer thought that's the case? It's a bit hard to trace as there
seems to be no simple ways to get max_segment_sizes out of various
devices. Obviously, having read up on the whole merge_bvec stuff I
shouldn't exactly be surprised that a pvmove between two devices may
result in, er, odd issues, but detach/reattach to devices with
different configs should work without having to restart drbd, right?

Any suggestions? Should I just expect it to behave this way/should
stacking on top of md be avoided/should I try the --use-bmbv option?
I'm a bit reluctant to test retriggering the situation too much (as  
restarting the whole drbd on the node in question stops a few things)  
before I can get some input on wether this can potentially corrupt  
data, if it is a bug or
if it was just an anomaly of my config or a result of the pvmove that
got something stuck.

Best regards,
David

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.