Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all,
First post here and first experience with drbd. Using DRBD v.0.6.12
(because it's the latest version marked as stable on Gentoo -- plan is to
acquire some experience with it, then try drbd 0.7x).
Running Gentoo-hardened kernel version 2.4.28 on i686. Equipment is
identical on both nodes. 3 nics on each, one each class A, B and C, plus
serial connection.
N.B.: Gentoo uses devfs.
Problem: Cannot work out why kernel panics upon starting and stopping the
init script. The init script and drbd.conf are copied in below.
Observations: drbd appears to work satisfactorily when started according
to the sequence of steps using drbdsetup, as described in the manual
(http://www.slackworks.com/~dkrovich/DRBD/usingdrbdsetup.html). drbd also
starts with "/drbd start" on the command line after loading the drbd
kernel module. However, the following detail may indicate that all is not
be working optimally:
* during a full sync "cat /proc/drbd" on node 1 reports "0 - cs: Connected
st: Primary/Secondary" but "1 - cs: Unconfigured". However, on node 2 "cat
/proc/drbd" reports "0 - cs: Connected st: Secondary/Primary". Why the
"Unconfigured" instead of a "WF" on node 1 as we expect from reading the
docs?
On the other hand:
* syncing speed is up to 12.5MB as docs state it should be and "cat
/proc/drbd" reports sync progress on both nodes.
* there appear to have been no problems mounting and using the file system
during 2 weeks of constant active use.
* running md5sum on fully synced disks produces identical results.
Here is our drbd.conf:
resource drbd0 {
protocol=C
fsckcmd=/bin/true
disk {
do-panic
disk-size=39078112
}
net {
sync-max=8M # bytes/sec
timeout=60
connect-int=10
ping-int=10
}
on ns1 {
device=/dev/nbd/0
disk=/dev/hdd1
address=10.0.0.1
port=7789
}
on ns2 {
device=/dev/nbd/0
disk=/dev/hdd1
address=10.0.0.2
port=7789
}
}
And the Gentoo-installed init script:
#!/sbin/runscript
depend() {
need net
before heartbeat
after sshd # In case there are sync problems
}
start() {
ebegin "Starting drbd mirror driver"
${DRBD} ${DRBDDEV} start
if [ "$?" == "1" ]; then # In case you decide this
eend 0 # node is primary the
fi # script returns 1
eend $?
}
stop() {
ebegin "Stopping drbd mirror driver"
${DRBD} ${DRBDDEV} stop
eend $?
Trying to get drbd to start via the init script is necessary for
heartbeat. When heartbeat's init script is run after starting drbd as
described above the following log is produced:
Feb 22 12:19:01 [heartbeat] info: **************************
Feb 22 12:19:01 [heartbeat] info: Configuration validated. Starting
heartbeat 1.2.3
Feb 22 12:19:01 [heartbeat] info: heartbeat: version 1.2.3
Feb 22 12:19:01 [heartbeat] info: Heartbeat generation: 3
Feb 22 12:19:01 [heartbeat] info: Starting serial heartbeat on tty
/dev/ttyS0 (19200 baud)
Feb 22 12:19:01 [heartbeat] info: UDP Broadcast heartbeat started on port
694 (694) interface eth2
Feb 22 12:19:01 [heartbeat] info: ping heartbeat started.
Feb 22 12:19:01 [heartbeat] info: pid 15074 locked in memory.
Feb 22 12:19:01 [heartbeat] info: pid 15075 locked in memory.
Feb 22 12:19:01 [heartbeat] info: pid 15076 locked in memory.
Feb 22 12:19:01 [heartbeat] info: pid 15078 locked in memory.
Feb 22 12:19:01 [heartbeat] info: pid 15079 locked in memory.
Feb 22 12:19:01 [heartbeat] info: pid 15036 locked in memory.
Feb 22 12:19:01 [heartbeat] info: Local status now set to: 'up'
Feb 22 12:19:02 [heartbeat] info: pid 15077 locked in memory.
Feb 22 12:19:02 [heartbeat] info: Link ns1:eth2 up.
Feb 22 12:19:02 [heartbeat] info: pid 15080 locked in memory.
Feb 22 12:19:02 [heartbeat] info: Link 191.250.1.1:191.250.1.1 up.
Feb 22 12:19:02 [heartbeat] info: Status update for node 191.250.1.1:
status ping
Feb 22 12:19:58 [heartbeat] WARN: TTY write timeout on [/dev/ttyS0] (no
connection or bad cable? [see documentation])
Feb 22 12:21:02 [heartbeat] WARN: node ns2: is dead
Feb 22 12:21:02 [heartbeat] info: Local status now set to: 'active'
Feb 22 12:21:02 [heartbeat] info: Starting child client
"/usr/lib/heartbeat/ipfail" (65,65)
Feb 22 12:21:02 [heartbeat] WARN: No STONITH device configured.
Feb 22 12:21:02 [heartbeat] WARN: Shared disks are not protected.
Feb 22 12:21:02 [heartbeat] info: Resources being acquired from ns2.
Feb 22 12:21:02 [heartbeat] info: Starting "/usr/lib/heartbeat/ipfail" as
uid 65 gid 65 (pid 15192)
Feb 22 12:21:02 [heartbeat] debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
Feb 22 12:21:02 [heartbeat] debug: StartNextRemoteRscReq(): child count 1
- Last output repeated twice -
Feb 22 12:21:03 [heartbeat] info: Local Resource acquisition completed.
Feb 22 12:21:03 [heartbeat] info: Initial resource acquisition complete
(T_RESOURCES(us))
Feb 22 12:21:03 [heartbeat] debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
Feb 22 12:21:14 [heartbeat] info: Local Resource acquisition completed.
(none)
Feb 22 12:21:14 [heartbeat] info: local resource transition completed.
Feb 22 12:21:41 [heartbeat] WARN: Shutdown delayed until current resource
activity finishes.
Many thanks for all observations received.
Barry Schatz