Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We use a simple 2node active-passive cluster with DRBD and NFS services.
Right now the cluster monitor detects a drbr failure every couple hours (~
2-40) and will fail over.
syslog shows the following lines just before pacepaker initiates the
failover:
--------------------------------------
Feb 24 20:55:54 drbdnode1 lrmd: [1659]: info: RA output:
(p_drbd_r0:0:monitor:stderr) <1>error creating netlink socket
Feb 24 20:55:54 drbdnode1 lrmd: [1659]: info: RA output:
(p_drbd_r0:0:monitor:stderr) Could not connect to 'drbd' generic netlink
family
Feb 24 20:55:54 drbdnode1 crmd: [1662]: info: process_lrm_event: LRM
operation p_drbd_r0:0_monitor_15000 (call=26, rc=7, cib-update=32,
confirmed=false) not running
Feb 24 20:55:55 drbdnode1 attrd: [1661]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-p_drbd_r0:0 (1)
--------------------------------------
does anyone has a clue why this might happen?
It only seems to happen when drbd runs primary on nodeA, though this node is
to be designed to be always primary as long as it's online...
thanks
Christoph Roethlisberger
SysConfig (both nodes):
---------------------------------
DualXeon E5606, 24GB RAM, Intel 10GbE NICs
Debian Squeeze
kernel: 3.2.0-0.bpo.1-amd64
pacemaker 1.1.6-2~bpo60+1
heartbeat 1:3.0.5-2~bpo60+1
DRBD 8.4.1 (module and userland tools)
---------------------------------
DRBD Config
---------------------------------
resource r0 {
volume 0 {
device minor 0;
disk /dev/vg01/vol01;
meta-disk internal;
}
on drbdnode1 {
address 192.168.100.1:7789;
}
on drbdnode2 {
address 192.168.100.2:7789;
}
}
---------------------------------
Pacemaker Config
---------------------------------
node $id="1b6d29da-484e-4b0f-a3ab-c7de3ea8f3ee" drbdnode1 \
attributes standby="off"
node $id="8bfba0ca-81cf-4fd1-ac89-79b4b4209151" drbdnode2 \
attributes standby="off"
primitive p_drbd_r0 ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="15s" role="Master" \
op monitor interval="30s" role="Slave"
primitive p_exportnfs_cgpro ocf:heartbeat:exportfs \
params fsid="100" directory="/srv/nfs/cgpro"
options="rw,sync,no_wdelay,no_root_squash,no_subtree_check,mountpoint"
clientspec="172.20.0.0/24" \
op monitor interval="30s" wait_for_leasetime_on_stop="true"
unlock_on_stop="true" rmtab_backup="none" \
meta target-role="Started"
primitive p_exportnfs_root ocf:heartbeat:exportfs \
params fsid="0" directory="/srv/nfs"
options="rw,sync,no_wdelay,no_root_squash,no_subtree_check,crossmnt"
clientspec="172.20.0.0/24" \
op monitor interval="30s" wait_for_leasetime_on_stop="true"
unlock_on_stop="true" rmtab_backup="none"
primitive p_fsmount_cgpro ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/r0/0" directory="/srv/nfs/cgpro"
fstype="xfs" options="nobarrier,inode64" \
op start interval="0s" timeout="60s" \
op stop interval="0s" timeout="120s" \
meta is-managed="true"
primitive p_ipv4 ocf:heartbeat:IPaddr2 \
params ip="172.20.0.3" nic="eth2" \
op monitor interval="5s"
primitive p_lsb_nfsserver lsb:nfs-kernel-server \
op monitor interval="30s"
group g_haservices p_ipv4 p_fsmount_cgpro p_exportnfs_cgpro \
meta target-role="Started"
ms ms_drbd_r0 p_drbd_r0 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
clone cl_exportnfs_root p_exportnfs_root
clone cl_lsb_nfsserver p_lsb_nfsserver
colocation c_ms-drbd-r0_with_haservices inf: g_haservices ms_drbd_r0:Master
order o_exportnfs-root_before_exportnfs_cgpro 0: cl_exportnfs_root
p_exportnfs_cgpro
order o_fsmount-cgrpro-before-exportnfs-cgpro inf: p_fsmount_cgpro
p_exportnfs_cgpro:start
order o_lsb-nfsserver-before-exportnfs-root inf: cl_lsb_nfsserver
cl_exportnfs_root
order o_ms-drbd-r0-before-fsmount-cgpro inf: ms_drbd_r0:promote
p_fsmount_cgpro:start
property $id="cib-bootstrap-options" \
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1330068747"
rsc_defaults $id="rsc-options" \
resource-stickiness="200"
---------------------------------