Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I am setting SLES11 SP1 HA on 2 nodes and have configures a master/slave drbd resource. I can start drbd, promote/demote hosts. mount/use the file system from the command line, but pacemaker fails to properly start up the drdb service. The 2 nodes are named storm (master) and storm-b (slave). Details of my setup are: ********** * storm: * ********** eth0: 172.16.0.1/16 (static) eth1: 172.20.168.239 (dhcp) ipmi: 172.16.1.1/16 (static) ************ * storm-b: * ************ eth0: 172.16.0.2/16 (static) eth1: 172.20.168.114 (dhcp) ipmi: 172.16.1.2/16 (static) *********************** * drbd configuration: * *********************** storm:~ # cat /etc/drbd.conf # # please have a a look at the example configuration file in # /usr/share/doc/packages/drbd-utils/drbd.conf # # Note that you can use the YaST2 drbd module to configure this # service! # include "drbd.d/global_common.conf"; include "drbd.d/*.res"; storm:~ # cat /etc/drbd.d/r0.res resource r0 { device /dev/drbd_r0 minor 0; meta-disk internal; on storm { disk /dev/sdc1; address 172.16.0.1:7811; } on storm-b { disk /dev/sde1; address 172.16.0.2:7811; } syncer { rate 120M; } } *********************************** * Output of "crm configure show": * *********************************** storm:~ # crm configure show node storm node storm-b primitive backupExec-ip ocf:heartbeat:IPaddr \ params ip="172.16.0.10" cidr_netmask="16" nic="eth0" \ op monitor interval="30s" primitive drbd-storage ocf:linbit:drbd \ params drbd_resource="r0" \ op monitor interval="60" role="Master" timeout="60" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="61" role="Slave" timeout="60" primitive drbd-storage-fs ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/disk1" fstype="ext3" primitive public-ip ocf:heartbeat:IPaddr \ meta target-role="started" \ operations $id="public-ip-operations" \ op monitor interval="30s" \ params ip="143.219.41.20" cidr_netmask="24" nic="eth1" primitive storm-fencing stonith:external/ipmi \ meta target-role="started" \ operations $id="storm-fencing-operations" \ op monitor interval="60" timeout="20" \ op start interval="0" timeout="20" \ params hostname="storm" ipaddr="172.16.1.1" userid="****" passwd="****" interface="lan" ms drbd-storage-masterslave drbd-storage \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" target-role="started" location drbd-storage-master-location drbd-storage-masterslave +inf: storm location storm-fencing-location storm-fencing +inf: storm-b colocation drbd-storage-fs-together inf: drbd-storage-fs drbd-storage-masterslave:Master order drbd-storage-fs-startup-order inf: drbd-storage-masterslave:promote drbd-storage-fs:start property $id="cib-bootstrap-options" \ dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ last-lrm-refresh="1277922623" \ node-health-strategy="only-green" \ stonith-enabled="true" \ stonith-action="poweroff" op_defaults $id="op_defaults-options" \ record-pending="false" ************************************ * Output of "crm_mon -o" on storm: * ************************************ storm:~ # crm_mon -o Attempting connection to the cluster... ============ Last updated: Wed Jun 30 15:25:15 2010 Stack: openais Current DC: storm - partition with quorum Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 2 Nodes configured, 2 expected votes 5 Resources configured. ============ Online: [ storm storm-b ] storm-fencing (stonith:external/ipmi): Started storm-b backupExec-ip (ocf::heartbeat:IPaddr): Started storm public-ip (ocf::heartbeat:IPaddr): Started storm Operations: * Node storm: public-ip: migration-threshold=1000000 + (8) start: rc=0 (ok) + (11) monitor: interval=30000ms rc=0 (ok) backupExec-ip: migration-threshold=1000000 + (7) start: rc=0 (ok) + (10) monitor: interval=30000ms rc=0 (ok) drbd-storage:0: migration-threshold=1000000 fail-count=1000000 + (9) start: rc=-2 (unknown exec error) + (14) stop: rc=0 (ok) * Node storm-b: storm-fencing: migration-threshold=1000000 + (7) start: rc=0 (ok) + (9) monitor: interval=6) ************************************** * Output of "crm_mon -o" on storm-b: * ************************************** storm-b:~ # crm_mon -o Attempting connection to the cluster... ============ Last updated: Wed Jun 30 15:25:25 2010 Stack: openais Current DC: storm - partition with quorum Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 2 Nodes configured, 2 expected votes 5 Resources configured. ============ Online: [ storm storm-b ] storm-fencing (stonith:external/ipmi): Started storm-b backupExec-ip (ocf::heartbeat:IPaddr): Started storm public-ip (ocf::heartbeat:IPaddr): Started storm Operations: * Node storm: public-ip: migration-threshold=1000000 + (8) start: rc=0 (ok) + (11) monitor: interval=30000ms rc=0 (ok) backupExec-ip: migration-threshold=1000000 + (7) start: rc=0 (ok) + (10) monitor: interval=30000ms rc=0 (ok) drbd-storage:0: migration-threshold=1000000 fail-count=1000000 + (9) start: rc=-2 (unknown exec error) + (14) stop: rc=0 (ok) * Node storm-b: storm-fencing: migration-threshold=1000000 + (7) start: rc=0 (ok) + (9) monitor: interval=60000ms rc=0 (ok) drbd-storage:1: migration-threshold=1000000 fail-count=1000000 + (8) start: rc=-2 (unknown exec error) + (12) stop: rc=0 (ok) Failed actions: drbd-storage:0_start_0 (node=storm, call=9, rc=-2, status=Timed Out): unknown exec error drbd-storage:1_start_0 (node=storm-b, call=8, rc=-2, status=Timed Out): unknown exec error ******************************************************** * Output of "rcdrbd status" on both storm and storm-b: * ******************************************************** # rcdrbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil at fat-tyre, 2010-01-13 17:17:27 m:res cs ro ds p mounted fstype 0:r0 StandAlone Secondary/Unknown UpToDate/DUnknown r---- ********************************* * Part of the drbd log entries: * ********************************* Jun 30 15:38:10 storm kernel: [ 3730.185457] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91) Jun 30 15:38:10 storm kernel: [ 3730.185459] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil at fat-tyre, 2010-01-13 17:17:27 Jun 30 15:38:10 storm kernel: [ 3730.185460] drbd: registered as block device major 147 Jun 30 15:38:10 storm kernel: [ 3730.185462] drbd: minor_table @ 0xffff88035fc0ca80 Jun 30 15:38:10 storm kernel: [ 3730.188253] block drbd0: Starting worker thread (from cqueue [9510]) Jun 30 15:38:10 storm kernel: [ 3730.188312] block drbd0: disk( Diskless -> Attaching ) Jun 30 15:38:10 storm kernel: [ 3730.188866] block drbd0: Found 4 transactions (4 active extents) in activity log. Jun 30 15:38:10 storm kernel: [ 3730.188868] block drbd0: Method to ensure write ordering: barrier Jun 30 15:38:10 storm kernel: [ 3730.188870] block drbd0: max_segment_size ( = BIO size ) = 32768 Jun 30 15:38:10 storm kernel: [ 3730.188872] block drbd0: drbd_bm_resize called with capacity == 9765216 Jun 30 15:38:10 storm kernel: [ 3730.188907] block drbd0: resync bitmap: bits=1220652 words=19073 Jun 30 15:38:10 storm kernel: [ 3730.188910] block drbd0: size = 4768 MB (4882608 KB) Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: (drbd-storage:0:start:stdout) Jun 30 15:38:10 storm kernel: [ 3730.189263] block drbd0: recounting of set bits took additional 0 jiffies Jun 30 15:38:10 storm kernel: [ 3730.189265] block drbd0: 4 KB (1 bits) marked out-of-sync by on disk bit-map. Jun 30 15:38:10 storm kernel: [ 3730.189269] block drbd0: disk( Attaching -> UpToDate ) Jun 30 15:38:10 storm kernel: [ 3730.191735] block drbd0: conn( StandAlone -> Unconnected ) Jun 30 15:38:10 storm kernel: [ 3730.191748] block drbd0: Starting receiver thread (from drbd0_worker [15487]) Jun 30 15:38:10 storm kernel: [ 3730.191780] block drbd0: receiver (re)started Jun 30 15:38:10 storm kernel: [ 3730.191785] block drbd0: conn( Unconnected -> WFConnection ) Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: (drbd-storage:0:start:stderr) 0: Failure: (124) Device is attached to a disk (use detach first) Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: (drbd-storage:0:start:stderr) Command 'drbdsetup 0 disk /dev/sdc1 /dev/sdc1 internal Jun 30 15:38:10 storm lrmd: [15233]: info: RA output: (drbd-storage:0:start:stderr) --set-defaults --create-device' terminated with exit code 10 Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Called drbdadm -c /etc/drbd.conf --peer storm-b up r0 Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Exit code 1 Jun 30 15:38:10 storm drbd[15341]: ERROR: r0: Command output: I made sure rcdrbd was stopped before starting rcopenais, so the failure related to the device being attached arrises during openais startup. ************************* * Result of ocf-tester: * ************************* storm:~ # ocf-tester -n drbd-storage -o drbd_resource="r0" /usr/lib/ocf/resource.d/linbit/drbd Beginning tests for /usr/lib/ocf/resource.d/linbit/drbd... * rc=6: Validation failed. Did you supply enough options with -o ? Aborting tests The only required parameter according to "crm ra info ocf:linbit:drbd" is drbd_resource, so there shouldn't be any additional options required to make ocf-tester work. I posted this to the pacemaker mailing list, but thought I 'ld cross-post because of the ocf-tester failure. Any suggestions for debugging and solutions would be most appreciated. Thanks, Bart