<HTML>

<HEAD>

<TITLE>Problem with disk errors</TITLE>

</HEAD>

<BODY>

<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>I have had a strange problem with DRBD 8.3 twice. &nbsp;The server running as a secondary had a disk problem and went diskless. &nbsp;The primary then saw the secondary was diskless and showed the transition for the secondary from UpToDate to Diskless. &nbsp;However, the primary still had problems with timeouts. My question is what do I need to do to allow the primary to run after the secondary has a disk problem? &nbsp;<BR>

<BR>

My drbd.conf is as follows:<BR>

<BR>

global {<BR>

&nbsp;&nbsp;&nbsp;&nbsp;usage-count yes;<BR>

}<BR>

<BR>

common {<BR>

&nbsp;&nbsp;syncer { rate 25M; }<BR>

}<BR>

<BR>

<BR>

resource drbd0 {<BR>

<BR>

&nbsp;&nbsp;protocol C;<BR>

<BR>

&nbsp;&nbsp;handlers {<BR>

&nbsp;&nbsp;&nbsp;&nbsp;pri-on-incon-degr &quot;echo o &gt; /proc/sysrq-trigger ; halt -f&quot;;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;pri-lost-after-sb &quot;echo o &gt; /proc/sysrq-trigger ; halt -f&quot;;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;local-io-error &quot;echo o &gt; /proc/sysrq-trigger ; halt -f&quot;;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;outdate-peer &quot;/usr/lib64/heartbeat/drbd-peer-outdater&quot;;<BR>

&nbsp;&nbsp;}<BR>

<BR>

&nbsp;&nbsp;startup {<BR>

&nbsp;&nbsp;&nbsp;&nbsp;degr-wfc-timeout 120; &nbsp;&nbsp;&nbsp;# 2 minutes.<BR>

&nbsp;&nbsp;}<BR>

<BR>

&nbsp;&nbsp;disk {<BR>

&nbsp;&nbsp;&nbsp;&nbsp;on-io-error &nbsp;&nbsp;detach;<BR>

&nbsp;&nbsp;}<BR>

<BR>

&nbsp;&nbsp;net {<BR>

&nbsp;&nbsp;&nbsp;&nbsp;max-buffers &nbsp;&nbsp;&nbsp;&nbsp;2048;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;max-epoch-size &nbsp;2048;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;after-sb-0pri disconnect;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;after-sb-1pri disconnect;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;after-sb-2pri disconnect;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;rr-conflict disconnect;<BR>

&nbsp;&nbsp;}<BR>

<BR>

&nbsp;&nbsp;syncer {<BR>

&nbsp;&nbsp;&nbsp;&nbsp;rate 25M;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;al-extents 257;<BR>

&nbsp;&nbsp;}<BR>

<BR>

&nbsp;&nbsp;on bg-host-m1 {<BR>

&nbsp;&nbsp;&nbsp;&nbsp;device &nbsp;&nbsp;&nbsp;&nbsp;/dev/drbd0;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;disk &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;/dev/sdb2;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;address &nbsp;&nbsp;&nbsp;172.20.0.1:7788;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;meta-disk &nbsp;/dev/sdb1[0];<BR>

&nbsp;&nbsp;}<BR>

<BR>

&nbsp;&nbsp;on bg-host-m2 {<BR>

&nbsp;&nbsp;&nbsp;&nbsp;device &nbsp;&nbsp;&nbsp;/dev/drbd0;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;disk &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;/dev/sdb2;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;address &nbsp;&nbsp;172.20.0.2:7788;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;meta-disk /dev/sdb1[0];<BR>

&nbsp;&nbsp;}<BR>

}<BR>

<BR>

<BR>

<BR>

Here is a small part of /var/log/messages on the primary. &nbsp;Also, please note the Concurrent local write message:<BR>

<BR>

Jan 29 11:10:32 bg-host-m2 iscsi_trgt: Logical Unit Reset (05) issued on tid:3 l<BR>

un:2 by sid:1127000341282880 (Function Complete)<BR>

Jan 29 11:10:32 bg-host-m2 drbd0: istiod3[22036] Concurrent local write detected<BR>

! [DISCARD L] new: 1471221576s +2048; pending: 1471221576s +2048<BR>

Jan 29 11:10:32 bg-host-m2 drbd0: istiod3[22036] Concurrent local write detected<BR>

! [DISCARD L] new: 3606015618s +19968; pending: 3606015618s +19968<BR>

Jan 29 11:10:57 bg-host-m2 iscsi_trgt: Logical Unit Reset (05) issued on tid:3 l<BR>

un:2 by sid:1127000341282880 (Function Complete)<BR>

Jan 29 11:10:59 bg-host-m2 ntpd[6722]: kernel time sync status change 4001Jan 29 11:11:06 bg-host-m2 drbd0: Got NegAck packet. Peer is in troubles?<BR>

Jan 29 11:11:06 bg-host-m2 drbd0: Got NegAck packet. Peer is in troubles?Jan 29 11:11:06 bg-host-m2 drbd0: pdsk( UpToDate -&gt; Diskless )<BR>

Jan 29 11:11:06 bg-host-m2 drbd0: Creating new current UUIDJan 29 11:11:06 bg-host-m2 drbd0: Got NegAck packet. Peer is in troubles?<BR>

<BR>

Later on the primary:<BR>

an 29 11:11:06 bg-host-m2 drbd0: istiod3[22035] Concurrent local write detected<BR>

! [DISCARD L] new: 3617118144s +3584; pending: 3617118144s +3584<BR>

Jan 29 11:12:05 bg-host-m2 nrpe[18563]: Could not read request from client, bail<BR>

ing out...<BR>

Jan 29 11:12:18 bg-host-m2 INFO: task istiod5:22060 blocked for more than 120 seconds.<BR>

Jan 29 11:12:18 bg-host-m2 &quot;echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.<BR>

Jan 29 11:12:18 bg-host-m2 istiod5 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;D 0000000000000000 &nbsp;&nbsp;&nbsp;&nbsp;0 22060 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2Jan 29 11:12:18 bg-host-m2 ffff88006a523d30 0000000000000046 0000000000000806 ff<BR>

ffffffa001764eJan 29 11:12:18 bg-host-m2 ffff88006a51abb0 ffff88006a51b140 ffff88006a51ade0 00<BR>

000001a0018206<BR>

Jan 29 11:12:18 bg-host-m2 0000000000000246 ffff88012e5f9c80 ffff88012e379800 ffff88012c0237f0<BR>

Jan 29 11:12:18 bg-host-m2 Call Trace:<BR>

Jan 29 11:12:18 bg-host-m2 [&lt;ffffffffa001764e&gt;] megasas_make_sgl64+0x46/0x59 [megaraid_sas]<BR>

<BR>

<BR>

Here is the secondary:<BR>

an 29 11:11:06 bg-host-m1 sd 4:0:0:0: [sdb] Device not ready: ASC=0x4 ASCQ=0x0<BR>

Jan 29 11:11:06 bg-host-m1 end_request: I/O error, dev sdb, sector 1216507571<BR>

Jan 29 11:11:06 bg-host-m1 drbd0: disk( UpToDate -&gt; Failed ) <BR>

Jan 29 11:11:06 bg-host-m1 drbd0: Local IO failed. Detaching...<BR>

Jan 29 11:11:06 bg-host-m1 drbd0: disk( Failed -&gt; Diskless ) <BR>

Jan 29 11:11:06 bg-host-m1 drbd0: Notified peer that my disk is broken.<BR>

<BR>

Then later on the secondary:<BR>

Jan 29 11:13:29 bg-host-m1 INFO: task drbd0_worker:32651 blocked for more than 120 seconds.<BR>

Jan 29 11:13:29 bg-host-m1 &quot;echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.<BR>

Jan 29 11:13:29 bg-host-m1 drbd0_worker &nbsp;D 000000000000000a &nbsp;&nbsp;&nbsp;&nbsp;0 32651 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2<BR>

Jan 29 11:13:29 bg-host-m1 ffff88008f589e10 0000000000000046 ffff8801088f0000 0000000000000000<BR>

<BR>

[lots more info deleted from this event]<BR>

<BR>

This keeps repeating<BR>

<BR>

-- <BR>

Terry Hull<BR>

Network Resource Group, Inc. President<BR>

<BR>

</SPAN></FONT>

</BODY>

</HTML>