Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a pacemaker/corosync cluster on CentOS 5 that is running drbd 8.3.15. The drbd build is from the CentOS Extras repository. This two-node cluster has been in production for quite a while and it has been very stable. I recently applied updates from CentOS 5.10 to CentOS 5.11. Upon reboot, the cluster came up and drbd-overview showed that the three DRBD devices are properly sync'd. However while attempting to failover the three resource groups related to those three DRBD devices, I experienced a failure in one of them with the error: Filesystem[9402]: ERROR: Couldn't find device [/dev/drbd/by-res/web]. Expected /dev/??? to exist (that would be output from the FileSystem pacemaker resource agent) Looking a bit further, I found that even though drbd-overview was reporting everything to be fine, I was not seeing all of the expected symlinks created in the /dev/drbd/by-res/ directory. Examination showed that across reboots I was getting either one or two of the expected three symlinks created in that directory. I also was not seeing anything identifiable in the other system logs regarding a problem, such as running out of resources. Eventually, I was able to get things back into a sane state by: 1. shutting down corosync on the problem node 2. doing a 'chkconfig corosync off' 3. rebooting the problem node 4. *without* starting corosync, doing a 'drbdadm up DEVICE' on each entry in drbd.conf 5. on the working node, put the drbd master/slave sets into the unmanaged state 6. start corosync on the problem node 7. bring the master/slave sets back into the managed state 8. chkconfig corosync on I've rebooted the problem node enough times now that I'm reasonably confident that whatever the cause of the problem was that it is no longer occurring, and I've successfully failed over all services to the formerly failing node. Despite being in a working state, I'd really prefer to know *why* this was happening. Under what circumstances could we expect DRBD to not create the /dev/drbd/by-res/ symlinks? Below I've included the drbd.conf, somewhat sanitized to protect the guilty: ========= /etc/drbd.conf ====================== global { usage-count no; } common { protocol C; syncer { rate 20M; verify-alg sha1; } disk { fencing resource-only; } # See http://www.drbd.org/users-guide-emb/s-pacemaker-fencing.html handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } net { cram-hmac-alg "sha1"; shared-secret "SomeSecret"; } } resource mysql { device /dev/drbd0; disk /dev/vg0/mysql; meta-disk internal; on hostA { address 192.168.5.50:7788; } on hostB { address 192.168.5.51:7788; } } resource cyrus { device /dev/drbd1; disk /dev/vg0/cyrus; meta-disk internal; on hostA { address 192.168.5.50:7789; } on hostB { address 192.168.5.51:7789; } } resource web { device /dev/drbd2; disk /dev/vg0/web; meta-disk internal; on hostA { address 192.168.5.50:7790; } on hostB { address 192.168.5.51:7790; } } ========= /etc/drbd.conf ======================