[DRBD-user] GFS2 - DualPrimaryDRBD hangs if a node Crashes

Fri Mar 24 16:57:40 CET 2017

On 24/03/17 04:19 AM, Raman Gupta wrote:
> Hi All,
> 
> I am having a problem where if in GFS2 dual-Primary-DRBD Pacemaker
> Cluster, a node crashes then the running node hangs! The CLVM commands
> hang, the libvirt VM on running node hangs. 

As Igor said, this is almost certainly a fence issue. Here's how DLM
works, roughly;

1. Corosync detects node loss, reforms cluster minus the lost node
2. Checks to see if the new cluster is quorate (always true on 2-node
with quorum=ignore)
3. Pacemaker informed of the membership change, DLM is informed and
blocks pending successful fence
4. Pacemaker invokes stonithd if fencing is configured
5. stonithd determines how to fence the lost node and invokes the
appropriate fence_X script
6. On fence success, DLM is informed, locks are reaped and new locks are
again allowed to be issued.

With no stonith configured, or broken stonith, there will be no
"success" so DLM stays blocked. This is by design. With DLM blocked,
anything using DLM (clvmd and gfs2) also block.

Note, this is separate from drbd. With fencing set to
'resource-and-stonith', which should ALWAYS be the case, DRBD will also
block until the fence handler tells it that the peer was fenced. Until
then, it blocks. This is how split-brains are avoided.

So, you need two things;

1. Enable stonith and configure the appropriate fence methods for your
hardware/environment
2. Configure DRBD to use fencing properly and use the crm-fence-peer.sh
fence handler to "hook" it into pacemaker's fencing.

> Env:
> ---------
> CentOS 7.3
> DRBD 8.4 
> gfs2-utils-3.1.9-3.el7.x86_64
> Pacemaker 1.1.15-11.el7_3.4
> corosync-2.4.0-4.el7.x86_64
> 
> 
> Infrastructure:
> ------------------------
> 1) Running A 2 node Pacemaker Cluster with proper fencing between the
> two. Nodes are server4 and server7.
> 
> 2) Running DRBD dual-Primary and hosting GFS2 filesystem.
> 
> 3) Pacemaker has DLM and cLVM resources configured among others.
> 
> 4) A KVM/QEMU virtual machine is running on server4 which is holding the
> cluster resources.
> 
> 
> Normal:
> ------------
> 5) In normal condition when the two nodes are completely UP then things
> are fine. The DRBD dual-primary works fine. The disk of VM is hosted on
> DRBD mount directory /backup and VM runs fine with Live Migration
> happily happening between the 2 nodes.
> 
> 
> Problem:
> ----------------
> 6) Stop server7 [shutdown -h now] ---> LVM commands like pvdisplay
> hangs, VM runs only for 120s ---> After 120s DRBD/GFS2 panics
> (/var/log/messages below) in server4 and DRBD mount directory (/backup)
> becomes unavailable and VM hangs in server4. The DRBD though is fine on
> server4 and in Primary/Secondary mode in WFConnection state.
> 
> Mar 24 11:29:28 server4 crm-fence-peer.sh[54702]: invoked for vDrbd
> Mar 24 11:29:28 server4 crm-fence-peer.sh[54702]: WARNING drbd-fencing
> could not determine the master id of drbd resource vDrbd
> Mar 24 11:29:28 server4 kernel: drbd vDrbd: helper command:
> /sbin/drbdadm fence-peer vDrbd exit code 1 (0x100)
> Mar 24 11:29:28 server4 kernel: drbd vDrbd: fence-peer helper broken,
> returned 1
> Mar 24 11:32:01 server4 kernel: INFO: task kworker/8:1H:822 blocked for
> more than 120 seconds.
> Mar 24 11:32:01 server4 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 24 11:32:01 server4 kernel: kworker/8:1H    D ffff880473796c18     0
>   822      2 0x00000080
> Mar 24 11:32:01 server4 kernel: Workqueue: glock_workqueue
> glock_work_func [gfs2]
> Mar 24 11:32:01 server4 kernel: ffff88027674bb10 0000000000000046
> ffff8802736e9f60 ffff88027674bfd8
> Mar 24 11:32:01 server4 kernel: ffff88027674bfd8 ffff88027674bfd8
> ffff8802736e9f60 ffff8804757ef808
> Mar 24 11:32:01 server4 kernel: 0000000000000000 ffff8804757efa28
> ffff8804757ef800 ffff880473796c18
> Mar 24 11:32:01 server4 kernel: Call Trace:
> Mar 24 11:32:01 server4 kernel: [<ffffffff8168bbb9>] schedule+0x29/0x70
> Mar 24 11:32:01 server4 kernel: [<ffffffffa0714ce4>]
> drbd_make_request+0x2a4/0x380 [drbd]
> Mar 24 11:32:01 server4 kernel: [<ffffffff812e0000>] ?
> aes_decrypt+0x260/0xe10
> Mar 24 11:32:01 server4 kernel: [<ffffffff810b17d0>] ?
> wake_up_atomic_t+0x30/0x30
> Mar 24 11:32:01 server4 kernel: [<ffffffff812ee6f9>]
> generic_make_request+0x109/0x1e0
> Mar 24 11:32:01 server4 kernel: [<ffffffff812ee841>] submit_bio+0x71/0x150
> Mar 24 11:32:01 server4 kernel: [<ffffffffa063ee11>]
> gfs2_meta_read+0x121/0x2a0 [gfs2]
> Mar 24 11:32:01 server4 kernel: [<ffffffffa063f392>]
> gfs2_meta_indirect_buffer+0x62/0x150 [gfs2]
> Mar 24 11:32:01 server4 kernel: [<ffffffff810d2422>] ?
> load_balance+0x192/0x990
> 
> 7) After server7 is UP, Pacemaker Cluster is started, DRBD started and
> Logical Volume activated and only after that in server4 the DRBD mount
> directory (/backup) becomes available and VM resumes in server4.  So
> after server7 is down and till it is completely UP the VM in server4 hangs.
> 
> 
> Can anyone help how to avoid running node hang when other node crashes?
> 
> 
> Attaching DRBD config file.
> 
> 
> --Raman
> 
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould