Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all, First post here and first experience with drbd. Using DRBD v.0.6.12 (because it's the latest version marked as stable on Gentoo -- plan is to acquire some experience with it, then try drbd 0.7x). Running Gentoo-hardened kernel version 2.4.28 on i686. Equipment is identical on both nodes. 3 nics on each, one each class A, B and C, plus serial connection. N.B.: Gentoo uses devfs. Problem: Cannot work out why kernel panics upon starting and stopping the init script. The init script and drbd.conf are copied in below. Observations: drbd appears to work satisfactorily when started according to the sequence of steps using drbdsetup, as described in the manual (http://www.slackworks.com/~dkrovich/DRBD/usingdrbdsetup.html). drbd also starts with "/drbd start" on the command line after loading the drbd kernel module. However, the following detail may indicate that all is not be working optimally: * during a full sync "cat /proc/drbd" on node 1 reports "0 - cs: Connected st: Primary/Secondary" but "1 - cs: Unconfigured". However, on node 2 "cat /proc/drbd" reports "0 - cs: Connected st: Secondary/Primary". Why the "Unconfigured" instead of a "WF" on node 1 as we expect from reading the docs? On the other hand: * syncing speed is up to 12.5MB as docs state it should be and "cat /proc/drbd" reports sync progress on both nodes. * there appear to have been no problems mounting and using the file system during 2 weeks of constant active use. * running md5sum on fully synced disks produces identical results. Here is our drbd.conf: resource drbd0 { protocol=C fsckcmd=/bin/true disk { do-panic disk-size=39078112 } net { sync-max=8M # bytes/sec timeout=60 connect-int=10 ping-int=10 } on ns1 { device=/dev/nbd/0 disk=/dev/hdd1 address=10.0.0.1 port=7789 } on ns2 { device=/dev/nbd/0 disk=/dev/hdd1 address=10.0.0.2 port=7789 } } And the Gentoo-installed init script: #!/sbin/runscript depend() { need net before heartbeat after sshd # In case there are sync problems } start() { ebegin "Starting drbd mirror driver" ${DRBD} ${DRBDDEV} start if [ "$?" == "1" ]; then # In case you decide this eend 0 # node is primary the fi # script returns 1 eend $? } stop() { ebegin "Stopping drbd mirror driver" ${DRBD} ${DRBDDEV} stop eend $? Trying to get drbd to start via the init script is necessary for heartbeat. When heartbeat's init script is run after starting drbd as described above the following log is produced: Feb 22 12:19:01 [heartbeat] info: ************************** Feb 22 12:19:01 [heartbeat] info: Configuration validated. Starting heartbeat 1.2.3 Feb 22 12:19:01 [heartbeat] info: heartbeat: version 1.2.3 Feb 22 12:19:01 [heartbeat] info: Heartbeat generation: 3 Feb 22 12:19:01 [heartbeat] info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud) Feb 22 12:19:01 [heartbeat] info: UDP Broadcast heartbeat started on port 694 (694) interface eth2 Feb 22 12:19:01 [heartbeat] info: ping heartbeat started. Feb 22 12:19:01 [heartbeat] info: pid 15074 locked in memory. Feb 22 12:19:01 [heartbeat] info: pid 15075 locked in memory. Feb 22 12:19:01 [heartbeat] info: pid 15076 locked in memory. Feb 22 12:19:01 [heartbeat] info: pid 15078 locked in memory. Feb 22 12:19:01 [heartbeat] info: pid 15079 locked in memory. Feb 22 12:19:01 [heartbeat] info: pid 15036 locked in memory. Feb 22 12:19:01 [heartbeat] info: Local status now set to: 'up' Feb 22 12:19:02 [heartbeat] info: pid 15077 locked in memory. Feb 22 12:19:02 [heartbeat] info: Link ns1:eth2 up. Feb 22 12:19:02 [heartbeat] info: pid 15080 locked in memory. Feb 22 12:19:02 [heartbeat] info: Link 191.250.1.1:191.250.1.1 up. Feb 22 12:19:02 [heartbeat] info: Status update for node 191.250.1.1: status ping Feb 22 12:19:58 [heartbeat] WARN: TTY write timeout on [/dev/ttyS0] (no connection or bad cable? [see documentation]) Feb 22 12:21:02 [heartbeat] WARN: node ns2: is dead Feb 22 12:21:02 [heartbeat] info: Local status now set to: 'active' Feb 22 12:21:02 [heartbeat] info: Starting child client "/usr/lib/heartbeat/ipfail" (65,65) Feb 22 12:21:02 [heartbeat] WARN: No STONITH device configured. Feb 22 12:21:02 [heartbeat] WARN: Shared disks are not protected. Feb 22 12:21:02 [heartbeat] info: Resources being acquired from ns2. Feb 22 12:21:02 [heartbeat] info: Starting "/usr/lib/heartbeat/ipfail" as uid 65 gid 65 (pid 15192) Feb 22 12:21:02 [heartbeat] debug: notify_world: setting SIGCHLD Handler to SIG_DFL Feb 22 12:21:02 [heartbeat] debug: StartNextRemoteRscReq(): child count 1 - Last output repeated twice - Feb 22 12:21:03 [heartbeat] info: Local Resource acquisition completed. Feb 22 12:21:03 [heartbeat] info: Initial resource acquisition complete (T_RESOURCES(us)) Feb 22 12:21:03 [heartbeat] debug: notify_world: setting SIGCHLD Handler to SIG_DFL Feb 22 12:21:14 [heartbeat] info: Local Resource acquisition completed. (none) Feb 22 12:21:14 [heartbeat] info: local resource transition completed. Feb 22 12:21:41 [heartbeat] WARN: Shutdown delayed until current resource activity finishes. Many thanks for all observations received. Barry Schatz