[Drbd-dev] [PATCH] Fix stale receiver thread state when drbd_connect() returns zero

Nikita V. Youshchenko nyoushchenko at ru.mvista.com
Thu Nov 6 10:44:14 CET 2008


Although I know that drbd 0.7 is old and not supported anymore, I still feel
that this information may be usable for people.

I have spent a while trying to find out why sometimes failover on drbd 0.7.24
fails on out customer's system.
At last, I was able to track what was hapenning.

I did not check, but the same issue may be present in later versions of drbd 
as well.

The patch below is against drbd 0.7.24 release.

From cf7541852c39bb507b4a3ccc42a9d8fa365cbbd3 Mon Sep 17 00:00:00 2001
From: Nikita V. Youshchenko <nyoushchenko at ru.mvista.com>
Date: Thu, 6 Nov 2008 00:42:47 +0300
Subject: [PATCH] Fix stale receiver thread state when drbd_connect() returns zero.

The following call stack may happen:

drbd_init() ->
  drbd_connect() ->
    drbd_do_handshake() ->
      drbd_recv_header() ->

At this level socket receive is called, that may return 0 (in case of peer
connection reset). Then drbd_thread_restart_nowait(&mdev->receiver) will
be called, and receiver state will be set to Restarting.

Later control returns to drbd_init(), goes to error path, and then to a new
iteration of connection loop, leaving receiver state set to Restarting.

Much later, this causes breakages. For example, drbd device does not handle
failover correctly.

This patch fixes stale receiver state on the described error path.

Signed-off-by: Nikita V. Youshchenko <nyoushchenko at ru.mvista.com>
 drbd/drbd_receiver.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index e769739..3a66b28 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -2105,6 +2105,10 @@ int drbdd_init(struct Drbd_thread *thi)
 			if (h == 0) {
+				spin_lock(&thi->t_lock);
+				if (thi->t_state == Restarting)
+					thi->t_state = Running;
+				spin_unlock(&thi->t_lock);

More information about the drbd-dev mailing list