Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> *why* > DRBD would not do that by itself, > so likely pacemaker decided to do that, > and you have to figure out *why*. > Pacemaker will have logged the reasons somewhere. The crm-fence-peer.sh script could not find the status of peer node (which went down) and assumed its status was "unknown" and thus placed a constraint on DRBD with -INFINITY score which essentially demotes and stops DRBD. The demotion failed because GFS2 was already mounted. This failure was construed as error by Pacemaker and it scheduled stonith for itself when the down node was back. > "crm-fence-peer.sh" assumes that the result of "uname -n" > is the local nodes "pacemaker node name". Yes. > If "uname -n" and "crm_node -n" do not return the same thing for you, > the defaults will not work for you. For my network the replication network (and its hostname) is different from client facing network (and its hostname): [root at server7]# uname -n server7 [root at server7]# crm_node -n server7ha However things seems to be working with these settings. >Then in addition to all your other trouble, > you have missing dependency constraints. The proper integration of DRBD+GFS2+DLM+CLVM resources into Pacemaker was the issue. The pacemaker ordered constraints on these resources and definition of these resources were tricky and took time to fix. Finally I made DLM, CLVM, GFS2 as cloned resources and DRBD as master (with master-max=2) for my dual-Primary setup. After this I arrived at correct ordering of these resources: Start & Promote DRBD then start DLM then start CLVM then start GFS2 Now things work fine. To help anyone with similar situation here is my cluster status: --------------------------------------------------------------------------------------------- [root at server4 ~]# pcs status Cluster name: vCluster Stack: corosync Current DC: server4ha (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum Last updated: Tue May 23 15:53:20 2017 Last change: Mon May 22 22:13:08 2017 by root via cibadmin on server4ha 2 nodes and 11 resources configured Online: [ server4ha server7ha ] Full list of resources: vCluster-VirtualIP-10.168.10.199 (ocf::heartbeat:IPaddr2): Started server4ha vCluster-Stonith-server4ha (stonith:fence_ipmilan): Started server7ha vCluster-Stonith-server7ha (stonith:fence_ipmilan): Started server4ha Clone Set: dlm-clone [dlm] Started: [ server4ha server7ha ] Clone Set: clvmd-clone [clvmd] Started: [ server4ha server7ha ] Master/Slave Set: drbd_data_clone [drbd_data] Masters: [ server4ha server7ha ] Clone Set: Gfs2FS-clone [Gfs2FS] Started: [ server4ha server7ha ] Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root at server4 ~]# [root at server4 ~]# [root at server4 ~]# My cluster constraints with ordered constraints in bold: ----------------------------------------------------------------------------------- [root at server4 ~]# pcs constraint show Location Constraints: Resource: vCluster-Stonith-server4ha Disabled on: server4ha (score:-INFINITY) Resource: vCluster-Stonith-server7ha Disabled on: server7ha (score:-INFINITY) Ordering Constraints: * promote drbd_data_clone then start dlm-clone (kind:Mandatory)* * start dlm-clone then start clvmd-clone (kind:Mandatory)* * start clvmd-clone then start Gfs2FS-clone (kind:Mandatory)* Colocation Constraints: dlm-clone with drbd_data_clone (score:INFINITY) clvmd-clone with dlm-clone (score:INFINITY) Gfs2FS-clone with clvmd-clone (score:INFINITY) Ticket Constraints: [root at server4 ~]# Thanks for all your help. -- Raman On Fri, May 12, 2017 at 8:30 PM, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > On Fri, May 12, 2017 at 02:04:57AM +0530, Raman Gupta wrote: > > > I don't think this has anything to do with DRBD, because: > > OK. > > > > > Apparently, something downed the NICs for corosync communication. > > > Which then leads to fencing. > > No problem with NICs. > > > > > Maybe you should double check your network configuration, > > > and any automagic reconfiguration of the network, > > > and only start corosync once your network is "stable"? > > As another manifestation of similar problem of dual-Primary DRBD > integrated > > with stonith enabled Pacemaker: When server7 goes down, the DRBD resource > > on surviving node server4 is attempted to be demoted as secondary. > > *why* > > DRBD would not do that by itself, > so likely pacemaker decided to do that, > and you have to figure out *why*. > Pacemaker will have logged the reasons somewhere. > > Seeing that you have different "uname -n" and "pacemaker node names", > that may well be the source of all your troubles. > > "crm-fence-peer.sh" assumes that the result of "uname -n" > is the local nodes "pacemaker node name". > > If "uname -n" and "crm_node -n" do not return the same thing for you, > the defaults will not work for you. > > > The > > demotion fails because DRBD is hosting a GFS2 volume and Pacemaker > complains > > of this failure as an error. > > Then in addition to all your other trouble, > you have missing dependency constraints. > IF pacemaker decides it needs to "demote" DRBD, > it should know that it has a file system mounted, > and should know that it needs to first unmount, > and that it needs to first stop services accessing that mount, > and so on. > > If it did not attempt to do that, your pacemaker config is broken. > If it did attempt to do that and failed, > you will have to look into why, which, again, should be in the logs. > > Double check constraints, and also double check if GFS2/DLM fencing is > properly integrated with pacemaker. > > -- > : Lars Ellenberg > : LINBIT | Keeping the Digital World Running > : DRBD -- Heartbeat -- Corosync -- Pacemaker > > DRBD® and LINBIT® are registered trademarks of LINBIT > __ > please don't Cc me, but send to list -- I'm subscribed > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170523/e11ae4bc/attachment.htm>