Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, When I try to sync the two nodes, the sync seems to stall out indefinitely. I must be missing something (something trivial I hope). This is the command I run on node1 to initiate the sync: /drbdadm -- --do-what-I-say primary all/ /drbdadm -- connect all / Here is the output of my drbd.conf resource r0 { protocol C; incon-degr-cmd "halt -f"; startup { degr-wfc-timeout 120; # 2 minutes } disk { on-io-error detach; } net { } syncer { rate 10M; group 1; al-extents 257; } on nfs1 { device /dev/drbd0; disk /dev/sda4; address 10.5.7.25:7788; meta-disk /dev/sda3[0]; } on nfs2 { device /dev/drbd0; disk /dev/sda4; address 10.5.7.26:7788; meta-disk /dev/sda3[0]; } } Here is the output from my syslog on the primary node: Aug 30 14:10:52 nfs1 kernel: drbd: module not supported by Novell, setting U taint flag. Aug 30 14:10:52 nfs1 kernel: drbd: initialised. Version: 0.7.18 (api:78/proto:74) Aug 30 14:10:52 nfs1 kernel: drbd: SVN Revision: 2186 build by lmb at chip, 2006-05-04 17:08:27 Aug 30 14:10:52 nfs1 kernel: drbd: registered as block device major 147 Aug 30 14:10:52 nfs1 kernel: drbd0: resync bitmap: bits=59215590 words=1850488 Aug 30 14:10:52 nfs1 kernel: drbd0: size = 225 GB (236862360 KB) Aug 30 14:10:52 nfs1 kernel: klogd 1.4.1, ---------- state change ---------- Aug 30 14:10:53 nfs1 kernel: drbd0: 225 GB marked out-of-sync by on disk bit-map. Aug 30 14:10:53 nfs1 kernel: drbd0: No usable activity log found. Aug 30 14:10:53 nfs1 kernel: drbd0: Marked additional 0 KB as out-of-sync based on AL. Aug 30 14:10:53 nfs1 kernel: drbd0: drbdsetup [4816]: cstate Unconfigured --> StandAlone Aug 30 14:10:53 nfs1 kernel: drbd0: drbdsetup [4829]: cstate StandAlone --> Unconnected Aug 30 14:10:53 nfs1 kernel: drbd0: drbd0_receiver [4830]: cstate Unconnected --> WFConnection Aug 30 14:10:53 nfs1 kernel: drbd0: using degr_wfc_timeout=120 seconds Aug 30 14:10:56 nfs1 kernel: drbd0: drbd0_receiver [4830]: cstate WFConnection --> WFReportParams Aug 30 14:10:56 nfs1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Aug 30 14:10:56 nfs1 kernel: drbd0: Connection established. Aug 30 14:10:56 nfs1 kernel: drbd0: I am(S): 1:00000002:00000001:00000007:00000001:00 Aug 30 14:10:56 nfs1 kernel: drbd0: Peer(S): 0:00000002:00000001:00000005:00000001:00 Aug 30 14:10:56 nfs1 kernel: drbd0: drbd0_receiver [4830]: cstate WFReportParams --> WFBitMapS Aug 30 14:10:56 nfs1 kernel: drbd0: Secondary/Unknown --> Secondary/Secondary Aug 30 14:10:56 nfs1 kernel: drbd0: drbd0_receiver [4830]: cstate WFBitMapS --> SyncSource Aug 30 14:10:56 nfs1 kernel: drbd0: Resync started as SyncSource (need to sync 236845976 KB [59211494 bits set]). Aug 30 14:13:35 nfs1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary Here is the output from cat /proc/drbd on node1 and node2 respectively: version: 0.7.18 (api:78/proto:74) SVN Revision: 2186 build by lmb at chip, 2006-05-04 17:08:27 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:274688 nr:0 dw:0 dr:274688 al:0 bm:16 lo:0 pe:0 ua:0 ap:0 [>...................] sync'ed: 0.2% (231026/231294)M stalled version: 0.7.18 (api:78/proto:74) SVN Revision: 2186 build by lmb at chip, 2006-05-04 17:08:27 0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent ns:0 nr:274688 dw:274688 dr:0 al:0 bm:16 lo:0 pe:256 ua:0 ap:0 [>...................] sync'ed: 0.2% (231026/231294)M stalled One thing to note however. After rebooting a few times and trying the commands manually I finally got the machines to sync. However, when I try to manually test these I still get stalled sync. I wonder if I'm just not typing the right commands. This is what I'm doing. When the nodes boot I'll run the SLES 10 init script - which as far as I can tell will modprobe drdb, drdbadm -d adjust, and then drdbadm wait_con_int. Then am I right to assume that I am to run drdbadm primary all on the primary node? And will that resync the nodes? If so, why would it stall out (just about every time)? I apologize if these questions are extremely remedial, I've scoured the web and the mail archives but I can't seem to find the answers I'm looking for. Any help would be appreciated greatly, Matt