[DRBD-user] HA DRBD setup - graceful failover/active node detection

Thu Jan 5 03:13:33 CET 2012

I have a two-node active/passive cluster, with DRBD controlled by
corosync/pacemaker.
All storage is based on LVM.

------------------------------------------------------------------------------------
a) How do I know, which node of the cluster is currently active?
   How can I check if a node is currently in use by the iSCSI-target daemon?

   I can try to deactivate a volume group using:

[root at node1 ~]# vgchange -an data
  Can't deactivate volume group "data" with 3 open logical volume(s)

In which case, if I get a message like the above then I know that
node1 is the active node, but is there a better (non-intrusive)
way to check?

A better option seems to be 'pvs -v'. If the node is active then it shows
the volume names:
[root at node1 ~]# pvs -v
    Scanning for physical volume names
  PV         VG      Fmt  Attr PSize   PFree DevSize PV UUID
  /dev/drbd1 data    lvm2 a-   109.99g    0  110.00g
c40m9K-tNk8-vTVz-tKix-UGyu-gYXa-gnKYoJ
  /dev/drbd2 tempdb  lvm2 a-    58.00g    0   58.00g
4CTq7I-yxAy-TZbY-TFxa-3alW-f97X-UDlGNP
  /dev/drbd3 distrib lvm2 a-    99.99g    0  100.00g
l0DqWG-dR7s-XD2M-3Oek-bAft-d981-UuLReC

where on the inactive node it gives errors:
[root at node2 ~]# pvs -v
    Scanning for physical volume names
  /dev/drbd0: open failed: Wrong medium type
  /dev/drbd1: open failed: Wrong medium type

Any further ideas/comments/suggestions?

------------------------------------------------------------------------------------

b) how can I gracefully failover to the other node ? Up to now, the only
way I
   know is forcing the active node to reboot (by entering two subsequent
'reboot'
   commands). This however breaks the DRBD synchronization, and I need to
   use a fix-split-brain procedure to bring back the DRBD in sync.

   On the other hand, if I try to stop the corosync service on the active
node,
   the command takes forever! I understand that the suggested procedure
should be
   to disconnect all clients from the active node and then stop services,
   is it a better approach to shut down the public network interface before
   stopping the corosync service (in order to forcibly close client
connections)?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120105/3f998f1c/attachment.htm>