Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello again, After spending some days with debugging my drbd-0.6.12 problem I found this! At the secondary side in the asender thread when reaching sock_recvmsg() call the signal DRBD_SIG is blocked when the secondary side is in the SynchingQuick state and afterwards in next Connected state. Who is blocking it and why? So the primary always times out waiting for WriteAck and then sends Ping, avoiding a completely deadlock. The WriteAck is only send after the received Ping, two seconds too late, because the secondary asender hangs in sock_recvmsg() with the DRBD_SIG blocked. I made the following patch that seems to work fine. But I am really a newbie kernel programmer so I am afraid that I am perhaps making a new bug or something else stupid? Please could someone just look at the patch for a moment without taking any responsibility for it!?! The patch code I used is the same code as you use around sock_sendmsg() call in drbd_main.c. // PATCH PRINTK HH in drbd_recv() function in drbd_receive.c module + if(via_msock){ + printk(KERN_ERR DEVICE_NAME "ASENDER: FORE. \n"); + if (sigismember(¤t->blocked, DRBD_SIG)) + printk(KERN_ERR DEVICE_NAME "ASENDER: SIGNAL DRBD_SIG IS BLOCKED\n"); + spin_lock_irqsave(¤t->SIGMASK_LOCK, flags); + oldset = current->blocked; + sigfillset(¤t->blocked); + sigdelset(¤t->blocked,DRBD_SIG); + RECALC_SIGPENDING(current); + spin_unlock_irqrestore(¤t->SIGMASK_LOCK, flags); + } rv = sock_recvmsg(sock, &msg, size, msg.msg_flags); // PATCH PRINTK HH + if(via_msock){ + printk(KERN_ERR DEVICE_NAME "ASENDER: EFTER. \n"); + spin_lock_irqsave(¤t->SIGMASK_LOCK, flags); + current->blocked = oldset; + RECALC_SIGPENDING(current); + spin_unlock_irqrestore(¤t->SIGMASK_LOCK, flags); + } /Hans -----Original Message----- From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com]On Behalf Of Lars Ellenberg Sent: den 22 augusti 2005 09:35 To: drbd-user at lists.linbit.com Subject: Re: [DRBD-user] DRBD (0.6) slows down the application well. I just thought it may have something to do with write hints being disabled because we have a race were we could forget to clean the flag after reconnect: Index: drbd/drbd_receiver.c =================================================================== --- drbd/drbd_receiver.c (Revision 1914) +++ drbd/drbd_receiver.c (Arbeitskopie) @@ -1379,6 +1379,9 @@ clear_bit(DO_NOT_INC_CONCNT,&drbd_conf[minor].flags); + /* it may still be set, because some unplug was on the fly */ + if (!disable_io_hints) mdev->flags &= ~(1<<WRITE_HINT_QUEUED); + printk(KERN_INFO DEVICE_NAME "%d: Connection lost.\n",minor); } Index: drbd/drbd_req-2.4.c =================================================================== --- drbd/drbd_req-2.4.c (Revision 1914) +++ drbd/drbd_req-2.4.c (Arbeitskopie) @@ -253,7 +253,9 @@ } if(!test_and_set_bit(WRITE_HINT_QUEUED,&mdev->flags)) { - queue_task(&mdev->write_hint_tq, &tq_disk); + /* if it could not be queued, clear our flag again, too */ + if (!queue_task(&mdev->write_hint_tq, &tq_disk)) + clear_bit(WRITE_HINT_QUEUED,&mdev->flags); } submit_bh(rw,nbh); -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client. _______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user