drbd-9.1.19 and drbd-9.2.8

Tue Mar 5 22:14:32 CET 2024

Hi DRBD-users,

As so often, we had to fix bugs in the area not-so-common corner cases.
We started with a long hunt for a bug that caused a diskless primary
node that seldom fails to re-issue pending read requests to another node
with a backing disk when it loses connection to the node it first sent
the requests to. It turned out that a caching pointer into a list data
structure could be a few elements "too far." When re-issuing the
mentioned read requests, it missed exactly the requests between where it
should point to and where it points to.

At about the same time, we also became aware that a DRBD device that you
drive into an IO-depth of 8000 write requests consumes too much CPU in
processing the write acknowledge packets when they come back in.
BTW, how do you do that? You need to use a system with lots of memory
and write to a DRBD device without using O_DIRECT. Then, when the kernel
starts flushing out the dirty pages, it creates high IO depths. When
doing benchmarks, people usually use O_DIRECT and control the IO-depth,
e.g., 1, 10s, or 100s.

So, we killed two bugs with one stone and started using the mentioned
caching pointer for the ACK processing. With that, we reduce the CPU
consumption of the ACK processing and make it a lot easier to detect
should the caching pointer be off.

We also learned that we needed more testing in the area of the
checksum-based resync. We have that now.

You can see in the changelog that we improved the online resize in case
not all nodes are online. Up to this point, it executed an online grow,
even if it does not know if the missing node's backing device also
grew. Too optimistic.
Now, it might be too permissive if the online partition misses some of
its members' complete mesh connections. We will put more work into this.

The RDMA code received a fix related to high IO depths. I suspect that
there is another bug in there that relates to cleaning up after
connection aborts.

I recommend upgrading.

9.2.8 (api:genl2/proto:86-122/transport:19)
--------
 * Fix the not-terminating-resync phenomenon between two nodes with
   backing disk in the presence of a diskless primary node under
   heavy I/O
 * Fix a rare race condition aborting connections claiming wrong
   protocol magic
 * Fix various problems of the checksum-based resync, including kernel
   crashes
 * Fix soft lockup messages in the RDMA transport under heavy I/O
 * changes merged from drbd-9.1.19
  - Fix a resync decision case where drbd wrongly decided to do a full
    resync, where a partial resync was sufficient; that happened in a
    specific connect order when all nodes were on the same data
    generation (UUID)
  - Fix the online resize code to obey cached size information about
    temporal unreachable nodes
  - Fix a rare corner case in which DRBD on a diskless primary node
    failed to re-issue a read request to another node with a backing
    disk upon connection loss on the connection where it shipped the
    read request initially
  - Make timeout during promotion attempts interruptible
  - No longer write activity-log updates on the secondary node in a
    cluster with precisely two nodes with backing disk; this is a
    performance optimization
  - Reduce CPU usage of acknowledgment processing

https://pkg.linbit.com//downloads/drbd/9/drbd-9.2.8.tar.gz
https://github.com/LINBIT/drbd/commit/e163b05a76254c0f51f999970e861d72bb16409a

https://pkg.linbit.com//downloads/drbd/9/drbd-9.1.19.tar.gz
https://github.com/LINBIT/drbd/commit/1d69c7411a0b59507e467545a32f20c3e6e2574c

cheers,
 Philipp