[DRBD-user] "kernel: bio too big device drbd0"

Fri Jun 7 14:28:14 CEST 2013

On 06/06/2013 09:12 PM, Lars Ellenberg wrote:
> Short term workaround:
> cat /sys/block/drbd0/queue/max_hw_sectors_kb > /sys/block/dm-9/queue/max_sectors_kb
>
> that is: limit max_sectors_kb (which is a tunable)
> to the currently apparent limits of the lower stack.
>
> That should stop new "too big" bios from being assembled.

Yes, it does, no further such messages occured after this
change.

> Then check the limits below drbd (they may have changed when you where
> "messing around" during the resize procedure).

The drbd0 device on the server where the messages were emitted
sits on top of an LVM device with these limits:

/sys/block/dm-7/queue/max_hw_sectors_kb 32767
/sys/block/dm-7/queue/max_sectors_kb 512

And this LVM device currently has just one physical volume
below it with these limits:

/sys/block/sdg/queue/max_hw_sectors_kb 32767
/sys/block/sdg/queue/max_sectors_kb 512

BUT: On the server where the secondary DRBD copy resides
(and where no "too big" messages were emitted), the
limits are different:

drbd0:
/sys/block/drbd0/queue/max_hw_sectors_kb 128
/sys/block/drbd0/queue/max_sectors_kb 128

The LVM below drbd0:
/sys/block/dm-5/queue/max_hw_sectors_kb 128
/sys/block/dm-5/queue/max_sectors_kb 128

The physical device the LVM resides on:
/sys/block/sdb/queue/max_hw_sectors_kb 128
/sys/block/sdb/queue/max_sectors_kb 128

The physical device on the secondary host was (shortly
before the resize of the drbd0) moved from a controller
with max_hw_sectors_kb=32767 to a different controller
in the same machine with max_hw_sectors_kb=128

My hypothesis is now the following one:

The move of the physical device on the secondary server
caused the whole dm-stack on that server to be changed to
max_hw_sectors_kb=128, and that went all fine.

Then shortly after that, when the "drbdadmin resize"
was issued, the drbd0 on the primary was also changed
to max_hw_sectors_kb=128, but the dm-crypt atop of it
was not notified about that, and continued to issue
larger bios.

Why the subsequent "cryptsetup resize" did not cause
the dm-crypt device to notice the lowered max_hw_sectors_kb
remains unknown to me.

Another thing I still wonder about is whether the
failed bios have caused dm-crypt to re-issue smaller
writes, or whether data has gone to /dev/null, with neither
the (XFS) filesystem or any users taking note of that (which seems
somewhat unlikely, given that in total 8130 "bio too big"
error messages accumulated in the syslog).

> but just do the whole drill:
> umount, close crypt, down drbd, then start things up again.
>
> Do the limits correctly stack then?

A reboot with a new kernel was scheduled for this evening,
anyway, so after that I'll be able to tell. (Trying now
would mean a very invonvenient down-time for several users.)

Regards,

Lutz Vieweg