Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We use a simple 2node active-passive cluster with DRBD and NFS services. Right now the cluster monitor detects a drbr failure every couple hours (~ 2-40) and will fail over. syslog shows the following lines just before pacepaker initiates the failover: -------------------------------------- Feb 24 20:55:54 drbdnode1 lrmd: [1659]: info: RA output: (p_drbd_r0:0:monitor:stderr) <1>error creating netlink socket Feb 24 20:55:54 drbdnode1 lrmd: [1659]: info: RA output: (p_drbd_r0:0:monitor:stderr) Could not connect to 'drbd' generic netlink family Feb 24 20:55:54 drbdnode1 crmd: [1662]: info: process_lrm_event: LRM operation p_drbd_r0:0_monitor_15000 (call=26, rc=7, cib-update=32, confirmed=false) not running Feb 24 20:55:55 drbdnode1 attrd: [1661]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-p_drbd_r0:0 (1) -------------------------------------- does anyone has a clue why this might happen? It only seems to happen when drbd runs primary on nodeA, though this node is to be designed to be always primary as long as it's online... thanks Christoph Roethlisberger SysConfig (both nodes): --------------------------------- DualXeon E5606, 24GB RAM, Intel 10GbE NICs Debian Squeeze kernel: 3.2.0-0.bpo.1-amd64 pacemaker 1.1.6-2~bpo60+1 heartbeat 1:3.0.5-2~bpo60+1 DRBD 8.4.1 (module and userland tools) --------------------------------- DRBD Config --------------------------------- resource r0 { volume 0 { device minor 0; disk /dev/vg01/vol01; meta-disk internal; } on drbdnode1 { address 192.168.100.1:7789; } on drbdnode2 { address 192.168.100.2:7789; } } --------------------------------- Pacemaker Config --------------------------------- node $id="1b6d29da-484e-4b0f-a3ab-c7de3ea8f3ee" drbdnode1 \ attributes standby="off" node $id="8bfba0ca-81cf-4fd1-ac89-79b4b4209151" drbdnode2 \ attributes standby="off" primitive p_drbd_r0 ocf:linbit:drbd \ params drbd_resource="r0" \ op monitor interval="15s" role="Master" \ op monitor interval="30s" role="Slave" primitive p_exportnfs_cgpro ocf:heartbeat:exportfs \ params fsid="100" directory="/srv/nfs/cgpro" options="rw,sync,no_wdelay,no_root_squash,no_subtree_check,mountpoint" clientspec="172.20.0.0/24" \ op monitor interval="30s" wait_for_leasetime_on_stop="true" unlock_on_stop="true" rmtab_backup="none" \ meta target-role="Started" primitive p_exportnfs_root ocf:heartbeat:exportfs \ params fsid="0" directory="/srv/nfs" options="rw,sync,no_wdelay,no_root_squash,no_subtree_check,crossmnt" clientspec="172.20.0.0/24" \ op monitor interval="30s" wait_for_leasetime_on_stop="true" unlock_on_stop="true" rmtab_backup="none" primitive p_fsmount_cgpro ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/r0/0" directory="/srv/nfs/cgpro" fstype="xfs" options="nobarrier,inode64" \ op start interval="0s" timeout="60s" \ op stop interval="0s" timeout="120s" \ meta is-managed="true" primitive p_ipv4 ocf:heartbeat:IPaddr2 \ params ip="172.20.0.3" nic="eth2" \ op monitor interval="5s" primitive p_lsb_nfsserver lsb:nfs-kernel-server \ op monitor interval="30s" group g_haservices p_ipv4 p_fsmount_cgpro p_exportnfs_cgpro \ meta target-role="Started" ms ms_drbd_r0 p_drbd_r0 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" clone cl_exportnfs_root p_exportnfs_root clone cl_lsb_nfsserver p_lsb_nfsserver colocation c_ms-drbd-r0_with_haservices inf: g_haservices ms_drbd_r0:Master order o_exportnfs-root_before_exportnfs_cgpro 0: cl_exportnfs_root p_exportnfs_cgpro order o_fsmount-cgrpro-before-exportnfs-cgpro inf: p_fsmount_cgpro p_exportnfs_cgpro:start order o_lsb-nfsserver-before-exportnfs-root inf: cl_lsb_nfsserver cl_exportnfs_root order o_ms-drbd-r0-before-fsmount-cgpro inf: ms_drbd_r0:promote p_fsmount_cgpro:start property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1330068747" rsc_defaults $id="rsc-options" \ resource-stickiness="200" ---------------------------------