Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 07/06/16 04:09 PM, Lars Ellenberg wrote: > On Tue, Jun 07, 2016 at 12:24:48PM -0400, Digimer wrote: >> On 07/06/16 08:46 AM, David Pullman wrote: >>> Digimer, >>> >>> Thanks for the direction, this sounds right for sure. So to actually do >>> this: >>> >>> 1. Because we're running RHEL 6.7, I think I need to use 1.1-pcs version >>> of the docs, chapter 8 Configure STONITH? Our nodes are supermicro based >>> with an IPMI BMC controller, but with common non-controllable power. So >>> I think I need to use the IPMI fencing agent? >> >> Yup, fence_ipmilan should work just fine. Try this, from the command line; >> >> fence_ipmilan -a <ipmi_ip> -l <ipmi_user> -p <ipmi_passwd> -o status >> >> If you can check the state of both nodes from both nodes, then it's a >> simple matter of adding it to pacemaker. Note that you will want to >> configure pacemaker with 'delay="15"' for the fence method for the >> primary node. >> >> This way, if comms breaks but both nodes are up, node 2 will look up how >> to fence node 1, see the delay and sleep for 15 seconds. Node 1 looks up >> how to fence node 2, sees no delay and shoots immediately. This way, you >> can ensure that node 1 (assuming it's the primary node) always wins. >> >>> 2. Would the correct approach for the DRBD fencing and handlers be the >>> guidance in users-guide84 under 8.3.2. Resource-level fencing using the >>> Cluster Information Base (CIB)? >> >> No need to worry about the CIB directly. The pcs tool makes configuring >> fencing in pacemaker pretty easy. Once you have fencing working in >> pacemaker, then you can hook DRBD into it by setting 'fencing >> resource-and-stonith' and set the fence handlers to crm-{un,}fence-peer.sh. >> >> With that, when DRBD loses the peer, it will block >> (resource-and-stonith) and call the fence handler (crm-fence-peer.sh). >> In turn, crm-fence-peer.sh asks pacemaker to shoot the lost node. > > Not exactly. The crm-fence-peer.sh script tries some heuristics > on the cib content and using crmadmin, and "figures out" > if "it is only me", or if pacemaker/"the cluster communication" also > does not see that peer anymore. > > If (heuristics say that) cluster comm to that node is still up, > we place some constraint telling pacemaker to NOT try to promote > anyone without access to *my* data, then continue. > > If (heuristics say that) cluster comm to that node is also down, > > ... and it looks clean down (or already successfully shot), > we place that same constraint anyways, then continue > > ... and you don't have pacemaker fencing enabled, there are scenarios > where you might end up with data divergence anyways. That can only > be avoided with fencing configured on both DRBD and pacemaker level. > > ... and it looks as if pacemaker will "soon" shoot that node > (or is already in the process of doing so), > but it has not been successfully shot yet, > we periodically poll the cib, until that is the case > or we hit a timeout. > > As of now, this script never asks pacemaker to shoot any peer. > It may, in specific scenarios, if called with --suicide-on-failure-if-primary, > ask pacemaker to have *this* node shot, and even tries to fall back to > other methods of suicide. > > More details in said script, > it is heavily commented and tries to be descriptive > about not only the what, but also the why. Oh wow, it's a lot smarter than I thought. Thanks for clarifying! >> DRBD >> will stay blocked until that succeeds (which is why stonith has to work >> in pacemaker before you setup fencing in DRBD). > >>> Fencing. 100% required, and will prevent split brains entirely. > > Yes :-) That statement I was confident in. :P -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?