<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#ffffff" text="#000000">
<p>Scenario:</p>
<ul>
<li>You have a mailserver (postfix, exim4, whatever...) in an
HA cluster and mailserver spool directories are replicated through a
drbd resource.</li>
<li>You have heartbeat installed and configured. Heartbeat is issued
through two interfaces, eth0 and eth1; eth1 is also the interface used
by drbd for replication</li>
<li>DRBD resource containng replicated directories is synchronized
using protocol C.</li>
<li>You want to get a mail notification every time fence-peer (or
outdate-peer) handler is called.</li>
</ul>
<p>I created an <span style="font-family: "Courier New";">/usr/lib/drbd/notify-fence-peer.sh</span>
based on the original <span style="font-family: "Courier New";">/usr/lib/drbd/notify.sh</span>
script. Then I modified <span style="font-family: "Courier New";">/etc/drbd.d/global_common.conf</span>
in order to have this script called when <span
style="font-family: "Courier New";">fence-peer</span> is invoked:</p>
<p><span style="font-family: "Courier New";">fence-peer
"/usr/lib/drbd/notify-fence-peer.sh;
/usr/lib/heartbeat/drbd-peer-outdater -t 5";</span></p>
<p>If I put down eth1 (the drbd interface) on the slave peer and i look
at <span style="font-family: "Courier New";">/proc/drbd</span> on
primary, instead of getting:<br>
<span style="font-family: "Courier New";">0: cs:WFConnection
ro:Primary/Unknown ds:UpToDate/Outdated C r----</span></p>
<p>I get:<br>
<span style="font-family: "Courier New";">0: cs:NetworkFailure
ro:Primary/Unknown ds:UpToDate/DUnknown C r----</span><br>
<br>
If I try to disconnect the resource via "<span
style="font-family: "Courier New";">drbd disconnect r0</span>" I get
stucked in <span style="font-family: "Courier New";">Disconnecting</span>
state. If I also try to perform a <span
style="font-family: "Courier New";">sync</span> I get stucked again,
with sync never returning.</p>
<p>I discover that this problem is dued to the mail command inside <span
style="font-family: "Courier New";">notify-fence-peer.sh</span>. Since
mailserver spool directories are in the drbd resource that is going to
be IO freezed for the outdate procedure, the mail server cannot send
the email and the <span style="font-family: "Courier New";">fence-peer</span>
handler is stucked.</p>
<p>To solve this issue I modified the last line of <span
style="font-family: "Courier New";">notify-fence-peer.sh</span> from:<br>
<span style="font-family: "Courier New";">echo "$BODY" | mail -s
"$SUBJECT" $RECIPIENT</span></p>
<p>to:<br>
<span style="font-family: "Courier New";">sleep 10s && echo
"$BODY" | mail -s "$SUBJECT" $RECIPIENT &</span></p>
This way drbd can complete the <span
style="font-family: "Courier New";">fence-peer</span> handler and
IO on replicated resource will be defreezed, so the mail command can
complete succesfully after 10 seconds.<br>
<br>
Am I doing it the right way or there is some issue I should consider?<br>
<br>
Thank you<br>
<pre class="moz-signature" cols="72">--
Dario Fiumicello
</pre>
</body>
</html>