<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000099">

<tt>To start with, just the evening after my initial post the other

server in the pair got stuck (for the 2nd time only) when I was

rebooting the main troublemaker.<br>

<br>

</tt>

<blockquote cite="mid:46CBA889.4030801@tramada.com" type="cite">

  <pre>On Mon, Aug 20, 2007 at 01:50:51PM +1000, Maciek Olczak wrote:

&gt; Hi all

&gt; 

&gt; I hope someone will be able to help with a rather stubbornly persistent and yet

&gt; random problem which has been bugging us for over two months. Every night

&gt; (backups time) six replica devices in all are asked to disconnect with:

&gt; 

&gt;  drbdadm disconnect replica_resource

&gt; 

&gt; On just one server at times this causes process

&gt; 

&gt;  drbdsetup /dev/drbd1 disconnect

&gt; 

&gt; to hang in such a way that it cannot be killed. But the device is disconnected

&gt; and replication is stopped. This may not happen for six nights and then it will

&gt; hang just a few hours after the system was rebooted.

&gt; 

&gt; Misbehaving server is one from two pairs of servers which run the same version

&gt; of DRBD (drbd-0.7.24-1) and kernel (Redhat EL4 2.6.9-55.ELsmp), although they

&gt; are of different architectures (i368 and x86_64). Other difference between the

&gt; two pairs is that one uses hardware RAID-ed devices and the one in question

&gt; uses native metadisk (/dev/md?) devices. Another piece of "exotic"

&gt; configuration is that IP addresses DRBD devices are bound to are on OpenVPN's

&gt; tun0 interfaces.

&gt; 

&gt; I have been at pains to find out what is the difference between the two servers

&gt; in this pair, which causes this problem. I reconciled DRBD config files and OS

&gt; RPMs and still did not get rid of this problem.

&gt; 

&gt; Being guided by <a moz-do-not-send="true"

 class="moz-txt-link-freetext"

 href="http://lists.linbit.com/pipermail/drbd-user/2006-April/">http://lists.linbit.com/pipermail/drbd-user/2006-April/</a>

&gt; 004832.html I gathered following evidence:

&gt; 

&gt; # cat /proc/drbd

&gt; 

&gt; version: 0.7.24 (api:79/proto:74)

&gt; SVN Revision: 2875 build by buildsystem@linbit, 2007-06-04 14:03:37

&gt;  0: cs:Connected st:Primary/Secondary ld:Consistent

&gt;     ns:32298620 nr:0 dw:31831224 dr:76573961 al:1370841 bm:10302 lo:3 pe:4 ua:0

&gt; ap:2

&gt;  1: cs:BrokenPipe st:Secondary/Unknown ld:Consistent

&gt;     ns:0 nr:5038052 dw:35515816 dr:127195916 al:651 bm:5783 lo:3 pe:0 ua:3 ap:0

                                                              ^^^^      ^^^^ ^^^^

which "drbd protocol"? C?

  </pre>

</blockquote>

<tt>Yes</tt><br>

<blockquote cite="mid:46CBA889.4030801@tramada.com" type="cite">

  <pre>there are three requests pending against the local disk. they are also unacknowledged,

for that reason.

&gt; # ps -ef|grep drbd|grep -v grep

&gt; 

&gt; root      2057     1  0 Aug19 ?        00:01:10 [drbd1_receiver]

&gt; root      4495     1  0 Aug15 ?        00:02:05 [drbd0_receiver]

&gt; root     18176     1  0 00:46 ?        00:00:15 [drbd0_asender]

&gt; root     20219     1  0 01:18 ?        00:00:00 drbdsetup /dev/drbd1 disconnect

&gt; root     26539     1  0 00:35 ?        00:00:07 [drbd0_worker]

&gt; 

&gt; # ps axo pid,wchan=WIDE-WCHAN-COLUMN,cmd -o comm | grep -w 2057 | grep -v grep

&gt; 

&gt;  2057 drbd_wait_ee      [drbd1_receiver] drbd1_receiver

the receiver waits for the reference count to reach zero.

&gt; # ps axo pid,wchan=WIDE-WCHAN-COLUMN,cmd -o comm | grep 20219 | grep -v grep

&gt; 

&gt; 20219 drbd_thread_stop  drbdsetup /dev/d drbdsetup

&gt; 

&gt; # cat /proc/20219/status

&gt; 

&gt; Name:   drbdsetup

&gt; State:  D (disk sleep)

...

and this waits for the receiver to terminate.

&gt; Aug 20 01:18:30 butterfly1 kernel: drbd1: drbdsetup [20219]: cstate Connected --&gt; Unconnected

&gt; Aug 20 01:18:30 butterfly1 kernel: drbd1: drbd1_receiver [2057]: cstate Unconnected --&gt; BrokenPipe

&gt; Aug 20 01:18:30 butterfly1 kernel: drbd1: short read expecting header on sock: r=-512

&gt; Aug 20 01:18:30 butterfly1 kernel: drbd1: worker terminated

&gt; Aug 20 01:18:30 butterfly1 kernel: drbd1: asender terminated

&gt; 

&gt; #

&gt; 

&gt; 

&gt; First block in the above is for the replica device on the remote server being

&gt; disconnected, backed up and reconnected. The second block is ALL there is in

&gt; messages (and dmesg) for this stuck "disconnect" command.

either the local disk just does not process these pending requests,

or we have some deadlock within drbd, where the only one that could

decrement those reference counts is itself waiting for this to happen,

or is dead already.

I've seen both in the past already.

we fixed race conditions in the past,

which resulted in deadlocks in drbd.

we may have more of those.

but I've also seen drbd linger in BrokenPipe or

NetworkFailure for ~20 minutes, before recovering

itself suddenly, so I doubt it is a deadlock within

drbd, that would not recover.

I know that sometimes requests may be queued almost

indefinetely, especially on MD, which strips off the

BIO_RW_SYNC bit from requests passed down to it, I

wrote a patch for that myself.

  </pre>

</blockquote>

<tt>Am I correct thinking this is not a part of rel. 0.7.24 ?</tt><br>

<blockquote cite="mid:46CBA889.4030801@tramada.com" type="cite">

  <pre>

maybe it helps to say "sync", or enable sysrequest

("echo 1 &gt; /proc/sys/kernel/sysrq"), and then trigger

an emergency sync ("echo s &gt; /proc/sysrq-trigger").

  </pre>

</blockquote>

<tt>Thanks for this hint. I'll put it in and see if clears these

"limbo" requests.<br>

<br>

BTW, the other server's hang (see above) was for the same reason. On

both servers</tt><tt>, every 3 minutes</tt><tt> I dump /proc/drbd into

a log. From there I can confirm that on each occasion there were

outstanding meta dev requests before disconnect was initiated. Here is

what happened in this event:<br>

<br>

Mon Aug 20 00:33:01 EST 2007<br>

&nbsp;0: cs:Connected st:Primary/Secondary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:35393892 nr:0 dw:185553796 dr:311622897 al:7201872 bm:1408286

lo:0 pe:3 ua:0 ap:0<br>

&nbsp;1: cs:Connected st:Secondary/Primary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:0 nr:4149696 dw:148856452 dr:506392585 al:2771 bm:14596 lo:0

pe:0 ua:0 ap:0<br>

<br>

Mon Aug 20 00:36:01 EST 2007<br>

&nbsp;0: cs:Connected st:Primary/Secondary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:35401428 nr:0 dw:185561332 dr:311624017 al:7202169 bm:1408286

lo:0 pe:3 ua:0 ap:0<br>

&nbsp;1: cs:StandAlone st:Primary/Unknown ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:0 nr:4154012 dw:148862492 dr:506544802 al:2942 bm:14767 lo:0

pe:0 ua:0 ap:0<br>

<br>

Mon Aug 20 00:39:01 EST 2007<br>

&nbsp;0: cs:Connected st:Primary/Secondary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:35406728 nr:0 dw:185566632 dr:311671689 al:7202584 bm:1408286

lo:0 pe:3 ua:0 ap:0<br>

&nbsp;1: cs:StandAlone st:Primary/Unknown ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:0 nr:4154012 dw:148862492 dr:512022634 al:2942 bm:14767 lo:165

pe:0 ua:0 ap:165<br>

<br>

Mon Aug 20 00:42:01 EST 2007<br>

&nbsp;0: cs:Connected st:Primary/Secondary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:35410448 nr:0 dw:185570352 dr:311775409 al:7202825 bm:1408286

lo:0 pe:3 ua:0 ap:0<br>

&nbsp;1: cs:StandAlone st:Primary/Unknown ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:0 nr:4154012 dw:148862492 dr:516931094 al:2942 bm:14767 lo:126

pe:0 ua:0 ap:126<br>

<br>

Mon Aug 20 00:45:01 EST 2007<br>

&nbsp;0: cs:Connected st:Primary/Secondary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:35417864 nr:0 dw:185577768 dr:312045529 al:7203266 bm:1408286

lo:0 pe:3 ua:0 ap:0<br>

&nbsp;1: cs:StandAlone st:Primary/Unknown ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:0 nr:4154012 dw:148862492 dr:522652514 al:2942 bm:14767 lo:128

pe:0 ua:0 ap:128<br>

<br>

Mon Aug 20 00:48:01 EST 2007<br>

&nbsp;0: cs:Connected st:Primary/Secondary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:35426000 nr:0 dw:185585904 dr:312360293 al:7203449 bm:1408286

lo:0 pe:3 ua:0 ap:0<br>

&nbsp;1: cs:Connected st:Secondary/Primary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:0 nr:17324 dw:148879816 dr:524824774 al:2942 bm:15332 lo:3 pe:0

ua:3 ap:0<br>

<br>

Mon Aug 20 00:51:01 EST 2007<br>

&nbsp;0: cs:Connected st:Primary/Secondary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:35434120 nr:0 dw:185594024 dr:313135109 al:7203753 bm:1408286

lo:33 pe:3 ua:0 ap:33<br>

&nbsp;1: cs:Connected st:Secondary/Primary ld:Consistent<br>

&nbsp;&nbsp;&nbsp; ns:0 nr:25784 dw:148888276 dr:524824774 al:2942 bm:15332 lo:3 pe:0

ua:3 ap:0<br>

<br>

The 3 unack'd requests were left when drbd1 was reconnected after

backup and persisted till the hang and reboot 22 hours later.<br>

<br>

</tt>

<blockquote cite="mid:46CBA889.4030801@tramada.com" type="cite">

  <pre>

&gt; There is one other curious thing I noticed. When DRBD is restarted (reboot) and

&gt; both devices are reconnected, the device which was cs:BrokenPipe will receive a

&gt; fair bit of re-sync data, as one would expect it after not being connected for

&gt; a few busy hours. But the odd thing is that when the primary device is

&gt; reconnected, a similarly large amount of data is re-synced, as if this device

&gt; was disconnected just as long as the replica, despite being off for just 2-3

&gt; minutes it takes to reboot. Example:

&gt; 

&gt; 

&gt;  0: cs:WFConnection st:Primary/Unknown ld:Consistent

&gt;     ns:0 nr:0 dw:3600 dr:104253 al:157 bm:284 lo:0 pe:0 ua:0 ap:0

&gt;  1: cs:SyncTarget st:Secondary/Primary ld:Inconsistent

&gt;     ns:0 nr:41584 dw:41580 dr:0 al:0 bm:27 lo:266 pe:348 ua:266 ap:0

&gt;         [=&gt;..................] sync'ed:  5.1% (848796/889276)K

&gt; 

&gt;  0: cs:SyncSource st:Primary/Secondary ld:Consistent

&gt;     ns:6380 nr:0 dw:11088 dr:211713 al:492 bm:616 lo:1 pe:57 ua:457 ap:0

&gt;         [&gt;...................] sync'ed:  1.6% (517976/524116)K

&gt;         finish: 0:01:20 speed: 6,140 (6,140) K/sec

&gt;  1: cs:Connected st:Secondary/Primary ld:Consistent

&gt;     ns:0 nr:898404 dw:898404 dr:0 al:0 bm:2570 lo:6 pe:0 ua:6 ap:0

&gt;         finish: 0:01:02 speed: 13,492 (13,492) K/sec

&gt; 

&gt; 

&gt; I've been monitoring both DRBD devices' stats as well as the underlying disk

&gt; devices' stats when the replica device is disconnected and stuck. During that

&gt; time the primary DRBD device (drbd0), its remote replica (drbd1) and both

&gt; metadisk devices (md?) show data being sent/received via network and written to

&gt; metadisks. Why then so many hundreds of MBs need to be re-synced?

huh? why is drbd1 the remote replica of drbd0?

you lost me here.

but anyways, if you reboot a primary without making it

secondary first, it looks like a primary crash, so we

have to resync the area covered by the activity log.

could that explain your observation?

  </pre>

</blockquote>

<tt><br>

Makes sense, thanks.<br>

<br>

</tt>

<blockquote cite="mid:46CBA889.4030801@tramada.com" type="cite">

  <pre>

-- 

: Lars Ellenberg                            Tel +43-1-8178292-0  :

: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :

: Vivenotgasse 48, A-1120 Vienna/Europe    <a moz-do-not-send="true"

 class="moz-txt-link-freetext" href="http://www.linbit.com">http://www.linbit.com</a> :

__

please use the "List-Reply" function of your email client.

_______________________________________________

drbd-user mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated"

 href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext"

 href="http://lists.linbit.com/mailman/listinfo/drbd-user">http://lists.linbit.com/mailman/listinfo/drbd-user</a>

  </pre>

  <br>

</blockquote>

</body>

</html>