[DRBD-user] /dev/drbd/by-res symlinks not getting created

Devin Reade gdr at gno.org
Mon Oct 6 00:32:04 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have a pacemaker/corosync cluster on CentOS 5 that is running
drbd 8.3.15.  The drbd build is from the CentOS Extras repository.

This two-node cluster has been in production for quite a while
and it has been very stable.  I recently applied updates from
CentOS 5.10 to CentOS 5.11.  Upon reboot, the cluster came up
and drbd-overview showed that the three DRBD devices are properly
sync'd.  However while attempting to failover the three resource
groups related to those three DRBD devices, I experienced a failure
in one of them with the error:

Filesystem[9402]: ERROR: Couldn't find device [/dev/drbd/by-res/web].
Expected /dev/??? to exist

(that would be output from the FileSystem pacemaker resource agent)

Looking a bit further, I found that even though drbd-overview was
reporting everything to be fine, I was not seeing all of the expected
symlinks created in the /dev/drbd/by-res/ directory.  Examination
showed that across reboots I was getting either one or two of the 
expected three symlinks created in that directory.  I also was not
seeing anything identifiable in the other system logs regarding
a problem, such as running out of resources.

Eventually, I was able to get things back into a sane state by:
 1. shutting down corosync on the problem node
 2. doing a 'chkconfig corosync off'
 3. rebooting the problem node
 4. *without* starting corosync, doing a 'drbdadm up DEVICE' on
    each entry in drbd.conf
 5. on the working node, put the drbd master/slave sets into the
    unmanaged state
 6. start corosync on the problem node
 7. bring the master/slave sets back into the managed state
 8. chkconfig corosync on

I've rebooted the problem node enough times now that I'm reasonably
confident that whatever the cause of the problem was that it is no
longer occurring, and I've successfully failed over all services to
the formerly failing node.

Despite being in a working state, I'd really prefer to know *why*
this was happening.  Under what circumstances could we expect DRBD
to not create the /dev/drbd/by-res/ symlinks? 

Below I've included the drbd.conf, somewhat sanitized to protect
the guilty:

========= /etc/drbd.conf ======================
global {
	usage-count no;
}

common {
	protocol	C;
	syncer {
		rate 20M;
		verify-alg sha1;
	}

	disk {
		fencing resource-only;
	}

	# See http://www.drbd.org/users-guide-emb/s-pacemaker-fencing.html
	handlers {
		fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
		after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
	}

	net {
		cram-hmac-alg "sha1";
		shared-secret "SomeSecret";
	}
}

resource mysql {
	device		/dev/drbd0;
	disk		/dev/vg0/mysql;
	meta-disk	internal;
	on hostA {
		address		192.168.5.50:7788;
	}
	on hostB {
		address		192.168.5.51:7788;
	}
}
resource cyrus {
	device		/dev/drbd1;
	disk		/dev/vg0/cyrus;
	meta-disk	internal;
	on hostA {
		address		192.168.5.50:7789;
	}
	on hostB {
		address		192.168.5.51:7789;
	}
}
resource web {
	device		/dev/drbd2;
	disk		/dev/vg0/web;
	meta-disk	internal;
	on hostA {
		address		192.168.5.50:7790;
	}
	on hostB {
		address		192.168.5.51:7790;
	}
}
========= /etc/drbd.conf ======================




More information about the drbd-user mailing list