Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 06/06/2013 09:12 PM, Lars Ellenberg wrote: > Short term workaround: > cat /sys/block/drbd0/queue/max_hw_sectors_kb > /sys/block/dm-9/queue/max_sectors_kb > > that is: limit max_sectors_kb (which is a tunable) > to the currently apparent limits of the lower stack. > > That should stop new "too big" bios from being assembled. Yes, it does, no further such messages occured after this change. > Then check the limits below drbd (they may have changed when you where > "messing around" during the resize procedure). The drbd0 device on the server where the messages were emitted sits on top of an LVM device with these limits: /sys/block/dm-7/queue/max_hw_sectors_kb 32767 /sys/block/dm-7/queue/max_sectors_kb 512 And this LVM device currently has just one physical volume below it with these limits: /sys/block/sdg/queue/max_hw_sectors_kb 32767 /sys/block/sdg/queue/max_sectors_kb 512 BUT: On the server where the secondary DRBD copy resides (and where no "too big" messages were emitted), the limits are different: drbd0: /sys/block/drbd0/queue/max_hw_sectors_kb 128 /sys/block/drbd0/queue/max_sectors_kb 128 The LVM below drbd0: /sys/block/dm-5/queue/max_hw_sectors_kb 128 /sys/block/dm-5/queue/max_sectors_kb 128 The physical device the LVM resides on: /sys/block/sdb/queue/max_hw_sectors_kb 128 /sys/block/sdb/queue/max_sectors_kb 128 The physical device on the secondary host was (shortly before the resize of the drbd0) moved from a controller with max_hw_sectors_kb=32767 to a different controller in the same machine with max_hw_sectors_kb=128 My hypothesis is now the following one: The move of the physical device on the secondary server caused the whole dm-stack on that server to be changed to max_hw_sectors_kb=128, and that went all fine. Then shortly after that, when the "drbdadmin resize" was issued, the drbd0 on the primary was also changed to max_hw_sectors_kb=128, but the dm-crypt atop of it was not notified about that, and continued to issue larger bios. Why the subsequent "cryptsetup resize" did not cause the dm-crypt device to notice the lowered max_hw_sectors_kb remains unknown to me. Another thing I still wonder about is whether the failed bios have caused dm-crypt to re-issue smaller writes, or whether data has gone to /dev/null, with neither the (XFS) filesystem or any users taking note of that (which seems somewhat unlikely, given that in total 8130 "bio too big" error messages accumulated in the syslog). > but just do the whole drill: > umount, close crypt, down drbd, then start things up again. > > Do the limits correctly stack then? A reboot with a new kernel was scheduled for this evening, anyway, so after that I'll be able to tell. (Trying now would mean a very invonvenient down-time for several users.) Regards, Lutz Vieweg