Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
When I try to sync the two nodes, the sync seems to stall out
indefinitely. I must be missing something (something trivial I hope).
This is the command I run on node1 to initiate the sync:
/drbdadm -- --do-what-I-say primary all/
/drbdadm -- connect all /
Here is the output of my drbd.conf
resource r0 {
protocol C;
incon-degr-cmd "halt -f";
startup {
degr-wfc-timeout 120; # 2 minutes
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on nfs1 {
device /dev/drbd0;
disk /dev/sda4;
address 10.5.7.25:7788;
meta-disk /dev/sda3[0];
}
on nfs2 {
device /dev/drbd0;
disk /dev/sda4;
address 10.5.7.26:7788;
meta-disk /dev/sda3[0];
}
}
Here is the output from my syslog on the primary node:
Aug 30 14:10:52 nfs1 kernel: drbd: module not supported by Novell,
setting U taint flag.
Aug 30 14:10:52 nfs1 kernel: drbd: initialised. Version: 0.7.18
(api:78/proto:74)
Aug 30 14:10:52 nfs1 kernel: drbd: SVN Revision: 2186 build by lmb at chip,
2006-05-04 17:08:27
Aug 30 14:10:52 nfs1 kernel: drbd: registered as block device major 147
Aug 30 14:10:52 nfs1 kernel: drbd0: resync bitmap: bits=59215590
words=1850488
Aug 30 14:10:52 nfs1 kernel: drbd0: size = 225 GB (236862360 KB)
Aug 30 14:10:52 nfs1 kernel: klogd 1.4.1, ---------- state change
----------
Aug 30 14:10:53 nfs1 kernel: drbd0: 225 GB marked out-of-sync by on disk
bit-map.
Aug 30 14:10:53 nfs1 kernel: drbd0: No usable activity log found.
Aug 30 14:10:53 nfs1 kernel: drbd0: Marked additional 0 KB as
out-of-sync based on AL.
Aug 30 14:10:53 nfs1 kernel: drbd0: drbdsetup [4816]: cstate
Unconfigured --> StandAlone
Aug 30 14:10:53 nfs1 kernel: drbd0: drbdsetup [4829]: cstate StandAlone
--> Unconnected
Aug 30 14:10:53 nfs1 kernel: drbd0: drbd0_receiver [4830]: cstate
Unconnected --> WFConnection
Aug 30 14:10:53 nfs1 kernel: drbd0: using degr_wfc_timeout=120 seconds
Aug 30 14:10:56 nfs1 kernel: drbd0: drbd0_receiver [4830]: cstate
WFConnection --> WFReportParams
Aug 30 14:10:56 nfs1 kernel: drbd0: Handshake successful: DRBD Network
Protocol version 74
Aug 30 14:10:56 nfs1 kernel: drbd0: Connection established.
Aug 30 14:10:56 nfs1 kernel: drbd0: I am(S):
1:00000002:00000001:00000007:00000001:00
Aug 30 14:10:56 nfs1 kernel: drbd0: Peer(S):
0:00000002:00000001:00000005:00000001:00
Aug 30 14:10:56 nfs1 kernel: drbd0: drbd0_receiver [4830]: cstate
WFReportParams --> WFBitMapS
Aug 30 14:10:56 nfs1 kernel: drbd0: Secondary/Unknown -->
Secondary/Secondary
Aug 30 14:10:56 nfs1 kernel: drbd0: drbd0_receiver [4830]: cstate
WFBitMapS --> SyncSource
Aug 30 14:10:56 nfs1 kernel: drbd0: Resync started as SyncSource (need
to sync 236845976 KB [59211494 bits set]).
Aug 30 14:13:35 nfs1 kernel: drbd0: Secondary/Secondary -->
Primary/Secondary
Here is the output from cat /proc/drbd on node1 and node2 respectively:
version: 0.7.18 (api:78/proto:74)
SVN Revision: 2186 build by lmb at chip, 2006-05-04 17:08:27
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:274688 nr:0 dw:0 dr:274688 al:0 bm:16 lo:0 pe:0 ua:0 ap:0
[>...................] sync'ed: 0.2% (231026/231294)M
stalled
version: 0.7.18 (api:78/proto:74)
SVN Revision: 2186 build by lmb at chip, 2006-05-04 17:08:27
0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
ns:0 nr:274688 dw:274688 dr:0 al:0 bm:16 lo:0 pe:256 ua:0 ap:0
[>...................] sync'ed: 0.2% (231026/231294)M
stalled
One thing to note however. After rebooting a few times and trying the
commands manually I finally got the machines to sync. However, when I
try to manually test these I still get stalled sync. I wonder if I'm
just not typing the right commands. This is what I'm doing. When the
nodes boot I'll run the SLES 10 init script - which as far as I can tell
will modprobe drdb, drdbadm -d adjust, and then drdbadm wait_con_int.
Then am I right to assume that I am to run drdbadm primary all on the
primary node? And will that resync the nodes? If so, why would it
stall out (just about every time)?
I apologize if these questions are extremely remedial, I've scoured the
web and the mail archives but I can't seem to find the answers I'm
looking for.
Any help would be appreciated greatly,
Matt