[DRBD-user] Mail Notification Issue with DRBD Outdater and Replicated Mailserver

Dario Fiumicello - Antek fiumicello at antek.it
Wed Nov 17 17:34:30 CET 2010


Scenario:

    * You have a mailserver (postfix, exim4, whatever...) in an
      HA cluster and mailserver spool directories are replicated through
      a drbd resource.
    * You have heartbeat installed and configured. Heartbeat is issued
      through two interfaces, eth0 and eth1; eth1 is also the interface
      used by drbd for replication
    * DRBD resource containng replicated directories is synchronized
      using protocol C.
    * You want to get a mail notification every time fence-peer (or
      outdate-peer) handler is called.

I created an /usr/lib/drbd/notify-fence-peer.sh based on the original 
/usr/lib/drbd/notify.sh script. Then I modified 
/etc/drbd.d/global_common.conf in order to have this script called when 
fence-peer is invoked:

fence-peer "/usr/lib/drbd/notify-fence-peer.sh; 
/usr/lib/heartbeat/drbd-peer-outdater -t 5";

If I put down eth1 (the drbd interface) on the slave peer and i look at 
/proc/drbd on primary, instead of getting:
0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated C r----

I get:
0: cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown C r----

If I try to disconnect the resource via "drbd disconnect r0" I get 
stucked in Disconnecting state. If I also try to perform a sync I get 
stucked again, with sync never returning.

I discover that this problem is dued to the mail command inside 
notify-fence-peer.sh. Since mailserver spool directories are in the drbd 
resource that is going to be IO freezed for the outdate procedure, the 
mail server cannot send the email and the fence-peer handler is stucked.

To solve this issue I modified the last line of notify-fence-peer.sh from:
echo "$BODY" | mail -s "$SUBJECT" $RECIPIENT

to:
sleep 10s && echo "$BODY" | mail -s "$SUBJECT" $RECIPIENT &

This way drbd can complete the fence-peer handler and IO on replicated 
resource will be defreezed, so the mail command can complete succesfully 
after 10 seconds.

Am I doing it the right way or there is some issue I should consider?

Thank you

-- 
Dario Fiumicello

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20101117/98cd1f9a/attachment.htm>


More information about the drbd-user mailing list