Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I am currently setting up a new system with drbd 0.7.16. This is CentOS 4.2, Linux xen03.emerion.com 2.6.9-22.0.2.ELsmp #1 SMP i686 i686 i386 GNU/Linux. on a P4 w/ HT. The problem is that initial syncing has stalled for one of the devices (but not the other one). Otherwise I have basically done nothing more than modprobe -s drbd `drbdadm sh-mod-parms` drbdadm up all ( on both hosts, then on xen03/primary: ) drbdadm -- --do-what-I-say primary all I then tried to disconnect/connect drbd0 on the primary -- no improvement. Please see the logs below for some strange error message. Have I done something wrong? Have I triggered a (new) bug in DRBD? Tell me if I can provide further information. Thank you very much. Best Regards, Michael Paesold Current /proc/drbd on primary: ------------------------------------------------------- version: 0.7.16 (api:77/proto:74) SVN Revision: 2066 build by mip at xen04.emerion.com, 2006-02-20 12:07:08 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:0 nr:0 dw:0 dr:8192 al:0 bm:0 lo:0 pe:0 ua:512 ap:0 [>...................] sync'ed: 0.1% (185516/185516)M stalled 1: cs:SyncSource st:Primary/Secondary ld:Consistent ns:58016996 nr:0 dw:0 dr:58018136 al:0 bm:26730 lo:158 pe:245 ua:285 ap:0 [======>.............] sync'ed: 30.6% (128859/185516)M finish: 0:45:06 speed: 48,744 (51,844) K/sec A little bit later on secondary: ------------------------------------------------------- version: 0.7.16 (api:77/proto:74) SVN Revision: 2066 build by mip at xen04.emerion.com, 2006-02-20 12:07:08 0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent ns:0 nr:0 dw:0 dr:0 al:0 bm:23190 lo:0 pe:1536 ua:0 ap:0 [>...................] sync'ed: 0.1% (185516/185516)M stalled 1: cs:SyncTarget st:Secondary/Primary ld:Inconsistent ns:0 nr:75120848 dw:75120848 dr:0 al:0 bm:27775 lo:0 pe:504 ua:0 ap:0 [=======>............] sync'ed: 39.6% (112156/185516)M finish: 0:28:35 speed: 66,912 (52,020) K/sec Logs from primary/sync source: (xen03) ------------------------------------------------------- 14:53:18 drbd0: resync bitmap: bits=47492149 words=1484130 14:53:18 drbd0: size = 181 GB (189968593 KB) 14:53:20 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map. 14:53:20 drbd0: Assuming that all blocks are out of sync (aka FullSync) 14:53:22 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map. 14:53:22 drbd0: Couldn't start thread (-4) 14:53:22 drbd0: drbdsetup [3461]: cstate Unconfigured --> StandAlone 14:54:02 drbd0: drbdsetup [3479]: cstate StandAlone --> Unconnected 14:54:02 drbd0: drbdsetup [3479]: cstate Unconnected --> StandAlone 14:54:02 drbd0: drbdsetup [3479]: cstate StandAlone --> Unconfigured 14:54:02 drbd0: _drbd_thread_stop: (thi->task == NULL) in /home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581 14:54:03 drbd0: resync bitmap: bits=47492149 words=1484130 14:54:03 drbd0: size = 181 GB (189968593 KB) 14:54:05 drbd0: 181 GB marked out-of-sync by on disk bit-map. 14:54:05 drbd0: No usable activity log found. 14:54:05 drbd0: drbdsetup [3494]: cstate Unconfigured --> StandAlone 14:54:05 drbd1: Creating state block 14:54:05 drbd1: resync bitmap: bits=47492149 words=1484130 14:54:05 drbd1: size = 181 GB (189968593 KB) 14:54:07 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map. 14:54:07 drbd1: Assuming that all blocks are out of sync (aka FullSync) 14:54:09 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map. 14:54:09 drbd1: drbdsetup [3495]: cstate Unconfigured --> StandAlone 14:54:09 drbd0: drbdsetup [3499]: cstate StandAlone --> Unconnected 14:54:09 drbd0: drbd0_receiver [3500]: cstate Unconnected --> WFConnection 14:54:09 drbd1: drbdsetup [3501]: cstate StandAlone --> Unconnected 14:54:09 drbd1: drbd1_receiver [3502]: cstate Unconnected --> WFConnection 14:54:34 drbd0: drbd0_receiver [3500]: cstate WFConnection --> WFReportParams 14:54:34 drbd0: Handshake successful: DRBD Network Protocol version 74 14:54:34 drbd0: Connection established. 14:54:34 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:00 14:54:34 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:00 14:54:34 drbd0: drbd0_receiver [3500]: cstate WFReportParams --> Connected 14:54:34 drbd0: I am inconsistent, but there is no sync? BOTH nodes inconsistent! 14:54:34 drbd0: Secondary/Unknown --> Secondary/Secondary 14:54:34 drbd1: drbd1_receiver [3502]: cstate WFConnection --> WFReportParams 14:54:34 drbd1: Handshake successful: DRBD Network Protocol version 74 14:54:34 drbd1: Connection established. 14:54:34 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:00 14:54:34 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:00 14:54:34 drbd1: drbd1_receiver [3502]: cstate WFReportParams --> Connected 14:54:34 drbd1: I am inconsistent, but there is no sync? BOTH nodes inconsistent! 14:54:34 drbd1: Secondary/Unknown --> Secondary/Secondary 14:55:05 drbd0: Secondary/Secondary --> Primary/Secondary 14:55:05 drbd0: I am(P): 1:00000002:00000001:00000001:00000001:11 14:55:05 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:01 14:55:05 drbd0: drbd0_receiver [3500]: cstate Connected --> WFBitMapS 14:55:05 drbd1: Secondary/Secondary --> Primary/Secondary 14:55:05 drbd1: I am(P): 1:00000002:00000001:00000001:00000001:11 14:55:05 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:01 14:55:05 drbd1: drbd1_receiver [3502]: cstate Connected --> WFBitMapS 14:55:05 drbd1: drbd1_receiver [3502]: cstate WFBitMapS --> SyncSource 14:55:05 drbd1: Resync started as SyncSource (need to sync 189968596 KB [47492149 bits set]). 14:55:06 drbd0: drbd0_receiver [3500]: cstate WFBitMapS --> SyncSource 14:55:06 drbd0: Resync started as SyncSource (need to sync 189968596 KB [47492149 bits set]). 14:59:24 drbd0: drbdsetup [3587]: cstate SyncSource --> Unconnected 14:59:24 drbd0: drbd0_receiver [3500]: cstate Unconnected --> BrokenPipe 14:59:24 drbd0: short read expecting header on sock: r=-512 14:59:24 drbd0: _drbd_thread_stop: (thi->task == NULL) in /home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581 14:59:24 drbd0: asender terminated 14:59:24 drbd0: _drbd_thread_stop: (thi->task == NULL) in /home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581 14:59:24 drbd0: ASSERT( mdev->ee_in_use == 0 ) in /home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_receiver.c:1881 14:59:24 drbd0: unacked_cnt = 1536 14:59:24 drbd0: drbd0_receiver [3500]: cstate BrokenPipe --> StandAlone 14:59:24 drbd0: Connection lost. 14:59:24 drbd0: receiver terminated 14:59:24 drbd0: drbdsetup [3587]: cstate StandAlone --> StandAlone 14:59:35 drbd0: drbdsetup [3602]: cstate StandAlone --> Unconnected 14:59:35 drbd0: drbd0_receiver [3603]: cstate Unconnected --> WFConnection 14:59:35 drbd0: drbd0_receiver [3603]: cstate WFConnection --> WFReportParams 14:59:35 drbd0: Handshake successful: DRBD Network Protocol version 74 14:59:35 drbd0: Connection established. 14:59:35 drbd0: I am(P): 1:00000002:00000001:00000001:00000001:10 14:59:35 drbd0: Peer(S): 0:00000002:00000001:00000001:00000001:01 14:59:35 drbd0: drbd0_receiver [3603]: cstate WFReportParams --> WFBitMapS 14:59:35 drbd0: Primary/Unknown --> Primary/Secondary 14:59:35 drbd0: drbd0_receiver [3603]: cstate WFBitMapS --> SyncSource 14:59:35 drbd0: Resync started as SyncSource (need to sync 189968596 KB [47492149 bits set]). 14:59:35 drbd0: resync_work.cb == w_resume_next_sg, should be w_resync_inactive logs from secondary/sync target: (xen04) ------------------------------------------------------- 15:56:20 drbd0: resync bitmap: bits=47492149 words=1484130 15:56:20 drbd0: size = 181 GB (189968593 KB) 15:56:22 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map. 15:56:22 drbd0: Assuming that all blocks are out of sync (aka FullSync) 15:56:24 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map. 15:56:24 drbd0: drbdsetup [4368]: cstate Unconfigured --> StandAlone 15:56:24 drbd1: Creating state block 15:56:24 drbd1: resync bitmap: bits=47492149 words=1484130 15:56:24 drbd1: size = 181 GB (189968593 KB) 15:56:26 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map. 15:56:26 drbd1: Assuming that all blocks are out of sync (aka FullSync) 15:56:28 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map. 15:56:28 drbd1: drbdsetup [4370]: cstate Unconfigured --> StandAlone 15:56:28 drbd0: drbdsetup [4374]: cstate StandAlone --> Unconnected 15:56:28 drbd0: drbd0_receiver [4375]: cstate Unconnected --> WFConnection 15:56:28 drbd1: drbdsetup [4376]: cstate StandAlone --> Unconnected 15:56:28 drbd1: drbd1_receiver [4377]: cstate Unconnected --> WFConnection 15:56:28 drbd0: drbd0_receiver [4375]: cstate WFConnection --> WFReportParams 15:56:28 drbd0: Handshake successful: DRBD Network Protocol version 74 15:56:28 drbd0: Connection established. 15:56:28 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:00 15:56:28 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:00 15:56:28 drbd0: drbd0_receiver [4375]: cstate WFReportParams --> Connected 15:56:28 drbd0: I am inconsistent, but there is no sync? BOTH nodes inconsistent! 15:56:28 drbd0: Secondary/Unknown --> Secondary/Secondary 15:56:28 drbd1: drbd1_receiver [4377]: cstate WFConnection --> WFReportParams 15:56:28 drbd1: Handshake successful: DRBD Network Protocol version 74 15:56:28 drbd1: Connection established. 15:56:28 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:00 15:56:28 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:00 15:56:28 drbd1: drbd1_receiver [4377]: cstate WFReportParams --> Connected 15:56:28 drbd1: I am inconsistent, but there is no sync? BOTH nodes inconsistent! 15:56:28 drbd1: Secondary/Unknown --> Secondary/Secondary 15:57:00 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:01 15:57:00 drbd0: Peer(P): 1:00000002:00000001:00000001:00000001:11 15:57:00 drbd0: drbd0_receiver [4375]: cstate Connected --> WFBitMapT 15:57:00 drbd0: Secondary/Secondary --> Secondary/Primary 15:57:00 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:01 15:57:00 drbd1: Peer(P): 1:00000002:00000001:00000001:00000001:11 15:57:00 drbd1: drbd1_receiver [4377]: cstate Connected --> WFBitMapT 15:57:00 drbd1: Secondary/Secondary --> Secondary/Primary 15:57:00 drbd1: drbd1_receiver [4377]: cstate WFBitMapT --> SyncTarget 15:57:00 drbd1: Resync started as SyncTarget (need to sync 189968596 KB [47492149 bits set]). 15:57:00 drbd0: drbd0_receiver [4375]: cstate WFBitMapT --> SyncTarget 15:57:00 drbd0: Resync started as SyncTarget (need to sync 189968596 KB [47492149 bits set]). 16:01:19 drbd0: sock was shut down by peer 16:01:19 drbd0: drbd0_receiver [4375]: cstate SyncTarget --> BrokenPipe 16:01:19 drbd0: short read expecting header on sock: r=0 16:01:19 drbd0: worker terminated 16:01:19 drbd0: meta connection shut down by peer. 16:01:19 drbd0: asender terminated 16:01:19 drbd0: drbd0_receiver [4375]: cstate BrokenPipe --> Unconnected 16:01:19 drbd0: Connection lost. 16:01:19 drbd0: drbd0_receiver [4375]: cstate Unconnected --> WFConnection 16:01:29 drbd0: drbd0_receiver [4375]: cstate WFConnection --> WFReportParams 16:01:29 drbd0: Handshake successful: DRBD Network Protocol version 74 16:01:29 drbd0: Connection established. 16:01:29 drbd0: I am(S): 0:00000002:00000001:00000001:00000001:01 16:01:29 drbd0: Peer(P): 1:00000002:00000001:00000001:00000001:10 16:01:29 drbd0: drbd0_receiver [4375]: cstate WFReportParams --> WFBitMapT 16:01:29 drbd0: Secondary/Unknown --> Secondary/Primary 16:01:30 drbd0: drbd0_receiver [4375]: cstate WFBitMapT --> SyncTarget 16:01:30 drbd0: Resync started as SyncTarget (need to sync 189968596 KB [47492149 bits set]). drbdadm dump: ------------------------------------------------------- resource drbd0 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; on xen03.emerion.com { device /dev/drbd0; disk /dev/sda6; address 10.1.0.3:7788; meta-disk /dev/sda5 [0]; } on xen04.emerion.com { device /dev/drbd0; disk /dev/sdb6; address 10.1.0.4:7788; meta-disk /dev/sdb5 [0]; } disk { on-io-error detach; } syncer { rate 60M; group 1; al-extents 257; } startup { wfc-timeout 60; degr-wfc-timeout 15; } } resource drbd1 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; on xen03.emerion.com { device /dev/drbd1; disk /dev/sdb6; address 10.1.0.3:7789; meta-disk /dev/sdb5 [0]; } on xen04.emerion.com { device /dev/drbd1; disk /dev/sda6; address 10.1.0.4:7789; meta-disk /dev/sda5 [0]; } disk { on-io-error detach; } syncer { rate 60M; group 1; al-extents 257; } startup { wfc-timeout 60; degr-wfc-timeout 15; } }