[Drbd-dev] [PATCH] drbd: fix a bug of got_peer_ack

Joel Colledge joel.colledge at linbit.com
Thu Sep 29 11:39:46 CEST 2022


Hi Xu,

> Consider a scenrio that io is ongoing and the backing disk of
> secondary drbd suddenly broken. Some requset from primary node
> will not be processed in receive_Data since there is no ldev.
> And primary node will send peer_ack to secondary node for those
> requsets, but the secondary node will not find these requests in
> got_peer_ack.
>
> The first problem caused by this bug is that the two nodes will be
> disconnected, and the second problem is that some peer requests
> can't be destroyed.

I can confirm this issue. Thanks for reporting it.

> Fix it by find the last peer request on peer_requests list and then
> the remaining requests on the list will be destroyed.

I believe this is a valid solution. It is missing the case where
another peer ack is sent afterwards too, so that got_peer_ack() is
called with connection->peer_requests empty. But don't worry about
that for now.

The question is - do we need to send peer acks to peers that responded
with P_NEG_ACK at all? At the point when the write fails on the
secondary, we could set the bitmap bits and free the request. Then we
don't need the peer-ack from the primary. This may lead to a simpler
and more robust solution. I'll try it.

Best regards,
Joel


More information about the drbd-dev mailing list