Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I recently tried to update a pair of servers from linux-4.9 to linux-4.11 that use DRBD 8.4.x (as included by the mainline linux kernel from kernel.org). The "secondary" server had been running linux-4.11 for some time without any issues. Both servers realize 3 drbd devices, two of which contain (separate) btrfs filesystems, 1 of which contains an XFS filesystem. Between the drbd devices and the filesystem there is one layer of dm-crypt block device. This setup has been used for years (with kernel updates from time to time). After updating the primary server to linux-4.11, I experienced the following issue that forced me to revert it to linux-4.9: Soon after user processes cause significant read-load plus a little write load to one of the btrfs filesystems, one can observe how the "utilization %" as displayed by "iostat -dx 3", which is usually of similar value on the drbd device and on the underlying physical disk, becomes different between the two devices: The drbd device is nailed to "100% utilization", while the physical device becomes idle. "dirty pages" and "writeback" - as displayed by "cat /proc/meminfo" - no longer are written to the physical device. There are no errors - no I/O errors, no strange messages, just the fact that more and more "dirty data" accumulates, leading to more and more processes sleeping in "D"-state. If I kill all processes that do I/O to the btrfs filesystem, the same amount of "dirty data" sits there unflushed forever (while on other devices, normal writing still occurs.) At this point, no "sync", "umount" or such will finish, and of course a soft reboot also hangs. The symptom is not specific to any one of the filesystems - if some I/O-load is applied, any of them (also multiple at the same time) get into this "100% utilization stuck"-state. This symptom occurs even if there is no secondary DRBD server to connect to, so it is probably unrelated to any network activities of DRBD. In the kernel .config, CONFIG_BLK_WBT=y - but I tested with both WBT turned on and off at runtime, the symptom occurs under both conditions. Any ideas what might go wrong here? Regards, Lutz Vieweg