[DRBD-user] DRBD (0.6) slows down the application. After some research

Hans Holm hans.holm at siatm.com
Mon Sep 12 12:47:31 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello again,

After spending some days with debugging my drbd-0.6.12 problem I found this!

At the secondary side in the asender thread when reaching sock_recvmsg()
call
the signal DRBD_SIG is blocked when the secondary side is in the
SynchingQuick state
and afterwards in next Connected state.

Who is blocking it and why?

So the primary always times out waiting for WriteAck and then sends Ping,
avoiding a completely deadlock.
The WriteAck is only send after the received Ping, two seconds too late,
because the secondary asender hangs in  sock_recvmsg() with the DRBD_SIG
blocked.

I made the following patch that seems to work fine.
But I am really a newbie kernel programmer so I am afraid that I am perhaps
making
a new bug or something else stupid?

Please could someone just look at the patch for a moment without taking any
responsibility for it!?!

The patch code I used is the same code as you use around sock_sendmsg() call
in drbd_main.c.

// PATCH PRINTK HH in drbd_recv() function in drbd_receive.c module

 +       if(via_msock){
 +         printk(KERN_ERR DEVICE_NAME "ASENDER: FORE. \n");
 +         if (sigismember(&current->blocked, DRBD_SIG))
 +           printk(KERN_ERR DEVICE_NAME "ASENDER:  SIGNAL DRBD_SIG IS
BLOCKED\n");
 +         spin_lock_irqsave(&current->SIGMASK_LOCK, flags);
 +         oldset = current->blocked;
 +         sigfillset(&current->blocked);
 +         sigdelset(&current->blocked,DRBD_SIG);
 +         RECALC_SIGPENDING(current);
 +         spin_unlock_irqrestore(&current->SIGMASK_LOCK, flags);
 +       }

         rv = sock_recvmsg(sock, &msg, size, msg.msg_flags);

        // PATCH PRINTK HH
  +      if(via_msock){
  +        printk(KERN_ERR DEVICE_NAME "ASENDER: EFTER. \n");
  +        spin_lock_irqsave(&current->SIGMASK_LOCK, flags);
  +        current->blocked = oldset;
  +        RECALC_SIGPENDING(current);
  +        spin_unlock_irqrestore(&current->SIGMASK_LOCK, flags);
  +      }

/Hans

-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com]On Behalf Of Lars Ellenberg
Sent: den 22 augusti 2005 09:35
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] DRBD (0.6) slows down the application



well.
I just thought it may have something to do with write hints being
disabled because we have a race were we could forget to clean the flag
after reconnect:


Index: drbd/drbd_receiver.c
===================================================================
--- drbd/drbd_receiver.c	(Revision 1914)
+++ drbd/drbd_receiver.c	(Arbeitskopie)
@@ -1379,6 +1379,9 @@

 	clear_bit(DO_NOT_INC_CONCNT,&drbd_conf[minor].flags);

+	/* it may still be set, because some unplug was on the fly */
+	if (!disable_io_hints) mdev->flags &= ~(1<<WRITE_HINT_QUEUED);
+
 	printk(KERN_INFO DEVICE_NAME "%d: Connection lost.\n",minor);
 }

Index: drbd/drbd_req-2.4.c
===================================================================
--- drbd/drbd_req-2.4.c	(Revision 1914)
+++ drbd/drbd_req-2.4.c	(Arbeitskopie)
@@ -253,7 +253,9 @@
 	}

 	if(!test_and_set_bit(WRITE_HINT_QUEUED,&mdev->flags)) {
-		queue_task(&mdev->write_hint_tq, &tq_disk);
+		/* if it could not be queued, clear our flag again, too */
+		if (!queue_task(&mdev->write_hint_tq, &tq_disk))
+			clear_bit(WRITE_HINT_QUEUED,&mdev->flags);
 	}

 	submit_bh(rw,nbh);


--
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list