Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I am currently setting up a new system with drbd 0.7.16. This is CentOS 4.2,
Linux xen03.emerion.com 2.6.9-22.0.2.ELsmp #1 SMP i686 i686 i386 GNU/Linux.
on a P4 w/ HT.
The problem is that initial syncing has stalled for one of the devices (but
not the other one).
Otherwise I have basically done nothing more than
modprobe -s drbd `drbdadm sh-mod-parms`
drbdadm up all
( on both hosts, then on xen03/primary: )
drbdadm -- --do-what-I-say primary all
I then tried to disconnect/connect drbd0 on the primary -- no improvement.
Please see the logs below for some strange error message. Have I done
something wrong? Have I triggered a (new) bug in DRBD? Tell me if I can
provide further information. Thank you very much.
Best Regards,
Michael Paesold
Current /proc/drbd on primary:
-------------------------------------------------------
version: 0.7.16 (api:77/proto:74)
SVN Revision: 2066 build by mip at xen04.emerion.com, 2006-02-20 12:07:08
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:0 nr:0 dw:0 dr:8192 al:0 bm:0 lo:0 pe:0 ua:512 ap:0
[>...................] sync'ed: 0.1% (185516/185516)M
stalled
1: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:58016996 nr:0 dw:0 dr:58018136 al:0 bm:26730 lo:158 pe:245 ua:285
ap:0
[======>.............] sync'ed: 30.6% (128859/185516)M
finish: 0:45:06 speed: 48,744 (51,844) K/sec
A little bit later on secondary:
-------------------------------------------------------
version: 0.7.16 (api:77/proto:74)
SVN Revision: 2066 build by mip at xen04.emerion.com, 2006-02-20 12:07:08
0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
ns:0 nr:0 dw:0 dr:0 al:0 bm:23190 lo:0 pe:1536 ua:0 ap:0
[>...................] sync'ed: 0.1% (185516/185516)M
stalled
1: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
ns:0 nr:75120848 dw:75120848 dr:0 al:0 bm:27775 lo:0 pe:504 ua:0 ap:0
[=======>............] sync'ed: 39.6% (112156/185516)M
finish: 0:28:35 speed: 66,912 (52,020) K/sec
Logs from primary/sync source: (xen03)
-------------------------------------------------------
14:53:18 drbd0: resync bitmap: bits=47492149 words=1484130
14:53:18 drbd0: size = 181 GB (189968593 KB)
14:53:20 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map.
14:53:20 drbd0: Assuming that all blocks are out of sync (aka FullSync)
14:53:22 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map.
14:53:22 drbd0: Couldn't start thread (-4)
14:53:22 drbd0: drbdsetup [3461]: cstate Unconfigured --> StandAlone
14:54:02 drbd0: drbdsetup [3479]: cstate StandAlone --> Unconnected
14:54:02 drbd0: drbdsetup [3479]: cstate Unconnected --> StandAlone
14:54:02 drbd0: drbdsetup [3479]: cstate StandAlone --> Unconfigured
14:54:02 drbd0: _drbd_thread_stop: (thi->task == NULL) in
/home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581
14:54:03 drbd0: resync bitmap: bits=47492149 words=1484130
14:54:03 drbd0: size = 181 GB (189968593 KB)
14:54:05 drbd0: 181 GB marked out-of-sync by on disk bit-map.
14:54:05 drbd0: No usable activity log found.
14:54:05 drbd0: drbdsetup [3494]: cstate Unconfigured --> StandAlone
14:54:05 drbd1: Creating state block
14:54:05 drbd1: resync bitmap: bits=47492149 words=1484130
14:54:05 drbd1: size = 181 GB (189968593 KB)
14:54:07 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map.
14:54:07 drbd1: Assuming that all blocks are out of sync (aka FullSync)
14:54:09 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map.
14:54:09 drbd1: drbdsetup [3495]: cstate Unconfigured --> StandAlone
14:54:09 drbd0: drbdsetup [3499]: cstate StandAlone --> Unconnected
14:54:09 drbd0: drbd0_receiver [3500]: cstate Unconnected --> WFConnection
14:54:09 drbd1: drbdsetup [3501]: cstate StandAlone --> Unconnected
14:54:09 drbd1: drbd1_receiver [3502]: cstate Unconnected --> WFConnection
14:54:34 drbd0: drbd0_receiver [3500]: cstate WFConnection -->
WFReportParams
14:54:34 drbd0: Handshake successful: DRBD Network Protocol version 74
14:54:34 drbd0: Connection established.
14:54:34 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:00
14:54:34 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:00
14:54:34 drbd0: drbd0_receiver [3500]: cstate WFReportParams --> Connected
14:54:34 drbd0: I am inconsistent, but there is no sync? BOTH nodes
inconsistent!
14:54:34 drbd0: Secondary/Unknown --> Secondary/Secondary
14:54:34 drbd1: drbd1_receiver [3502]: cstate WFConnection -->
WFReportParams
14:54:34 drbd1: Handshake successful: DRBD Network Protocol version 74
14:54:34 drbd1: Connection established.
14:54:34 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:00
14:54:34 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:00
14:54:34 drbd1: drbd1_receiver [3502]: cstate WFReportParams --> Connected
14:54:34 drbd1: I am inconsistent, but there is no sync? BOTH nodes
inconsistent!
14:54:34 drbd1: Secondary/Unknown --> Secondary/Secondary
14:55:05 drbd0: Secondary/Secondary --> Primary/Secondary
14:55:05 drbd0: I am(P): 1:00000002:00000001:00000001:00000001:11
14:55:05 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:01
14:55:05 drbd0: drbd0_receiver [3500]: cstate Connected --> WFBitMapS
14:55:05 drbd1: Secondary/Secondary --> Primary/Secondary
14:55:05 drbd1: I am(P): 1:00000002:00000001:00000001:00000001:11
14:55:05 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:01
14:55:05 drbd1: drbd1_receiver [3502]: cstate Connected --> WFBitMapS
14:55:05 drbd1: drbd1_receiver [3502]: cstate WFBitMapS --> SyncSource
14:55:05 drbd1: Resync started as SyncSource (need to sync 189968596 KB
[47492149 bits set]).
14:55:06 drbd0: drbd0_receiver [3500]: cstate WFBitMapS --> SyncSource
14:55:06 drbd0: Resync started as SyncSource (need to sync 189968596 KB
[47492149 bits set]).
14:59:24 drbd0: drbdsetup [3587]: cstate SyncSource --> Unconnected
14:59:24 drbd0: drbd0_receiver [3500]: cstate Unconnected --> BrokenPipe
14:59:24 drbd0: short read expecting header on sock: r=-512
14:59:24 drbd0: _drbd_thread_stop: (thi->task == NULL) in
/home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581
14:59:24 drbd0: asender terminated
14:59:24 drbd0: _drbd_thread_stop: (thi->task == NULL) in
/home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581
14:59:24 drbd0: ASSERT( mdev->ee_in_use == 0 ) in
/home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_receiver.c:1881
14:59:24 drbd0: unacked_cnt = 1536
14:59:24 drbd0: drbd0_receiver [3500]: cstate BrokenPipe --> StandAlone
14:59:24 drbd0: Connection lost.
14:59:24 drbd0: receiver terminated
14:59:24 drbd0: drbdsetup [3587]: cstate StandAlone --> StandAlone
14:59:35 drbd0: drbdsetup [3602]: cstate StandAlone --> Unconnected
14:59:35 drbd0: drbd0_receiver [3603]: cstate Unconnected --> WFConnection
14:59:35 drbd0: drbd0_receiver [3603]: cstate WFConnection -->
WFReportParams
14:59:35 drbd0: Handshake successful: DRBD Network Protocol version 74
14:59:35 drbd0: Connection established.
14:59:35 drbd0: I am(P): 1:00000002:00000001:00000001:00000001:10
14:59:35 drbd0: Peer(S): 0:00000002:00000001:00000001:00000001:01
14:59:35 drbd0: drbd0_receiver [3603]: cstate WFReportParams --> WFBitMapS
14:59:35 drbd0: Primary/Unknown --> Primary/Secondary
14:59:35 drbd0: drbd0_receiver [3603]: cstate WFBitMapS --> SyncSource
14:59:35 drbd0: Resync started as SyncSource (need to sync 189968596 KB
[47492149 bits set]).
14:59:35 drbd0: resync_work.cb == w_resume_next_sg, should be
w_resync_inactive
logs from secondary/sync target: (xen04)
-------------------------------------------------------
15:56:20 drbd0: resync bitmap: bits=47492149 words=1484130
15:56:20 drbd0: size = 181 GB (189968593 KB)
15:56:22 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map.
15:56:22 drbd0: Assuming that all blocks are out of sync (aka FullSync)
15:56:24 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map.
15:56:24 drbd0: drbdsetup [4368]: cstate Unconfigured --> StandAlone
15:56:24 drbd1: Creating state block
15:56:24 drbd1: resync bitmap: bits=47492149 words=1484130
15:56:24 drbd1: size = 181 GB (189968593 KB)
15:56:26 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map.
15:56:26 drbd1: Assuming that all blocks are out of sync (aka FullSync)
15:56:28 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map.
15:56:28 drbd1: drbdsetup [4370]: cstate Unconfigured --> StandAlone
15:56:28 drbd0: drbdsetup [4374]: cstate StandAlone --> Unconnected
15:56:28 drbd0: drbd0_receiver [4375]: cstate Unconnected --> WFConnection
15:56:28 drbd1: drbdsetup [4376]: cstate StandAlone --> Unconnected
15:56:28 drbd1: drbd1_receiver [4377]: cstate Unconnected --> WFConnection
15:56:28 drbd0: drbd0_receiver [4375]: cstate WFConnection -->
WFReportParams
15:56:28 drbd0: Handshake successful: DRBD Network Protocol version 74
15:56:28 drbd0: Connection established.
15:56:28 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:00
15:56:28 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:00
15:56:28 drbd0: drbd0_receiver [4375]: cstate WFReportParams --> Connected
15:56:28 drbd0: I am inconsistent, but there is no sync? BOTH nodes
inconsistent!
15:56:28 drbd0: Secondary/Unknown --> Secondary/Secondary
15:56:28 drbd1: drbd1_receiver [4377]: cstate WFConnection -->
WFReportParams
15:56:28 drbd1: Handshake successful: DRBD Network Protocol version 74
15:56:28 drbd1: Connection established.
15:56:28 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:00
15:56:28 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:00
15:56:28 drbd1: drbd1_receiver [4377]: cstate WFReportParams --> Connected
15:56:28 drbd1: I am inconsistent, but there is no sync? BOTH nodes
inconsistent!
15:56:28 drbd1: Secondary/Unknown --> Secondary/Secondary
15:57:00 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:01
15:57:00 drbd0: Peer(P): 1:00000002:00000001:00000001:00000001:11
15:57:00 drbd0: drbd0_receiver [4375]: cstate Connected --> WFBitMapT
15:57:00 drbd0: Secondary/Secondary --> Secondary/Primary
15:57:00 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:01
15:57:00 drbd1: Peer(P): 1:00000002:00000001:00000001:00000001:11
15:57:00 drbd1: drbd1_receiver [4377]: cstate Connected --> WFBitMapT
15:57:00 drbd1: Secondary/Secondary --> Secondary/Primary
15:57:00 drbd1: drbd1_receiver [4377]: cstate WFBitMapT --> SyncTarget
15:57:00 drbd1: Resync started as SyncTarget (need to sync 189968596 KB
[47492149 bits set]).
15:57:00 drbd0: drbd0_receiver [4375]: cstate WFBitMapT --> SyncTarget
15:57:00 drbd0: Resync started as SyncTarget (need to sync 189968596 KB
[47492149 bits set]).
16:01:19 drbd0: sock was shut down by peer
16:01:19 drbd0: drbd0_receiver [4375]: cstate SyncTarget --> BrokenPipe
16:01:19 drbd0: short read expecting header on sock: r=0
16:01:19 drbd0: worker terminated
16:01:19 drbd0: meta connection shut down by peer.
16:01:19 drbd0: asender terminated
16:01:19 drbd0: drbd0_receiver [4375]: cstate BrokenPipe --> Unconnected
16:01:19 drbd0: Connection lost.
16:01:19 drbd0: drbd0_receiver [4375]: cstate Unconnected --> WFConnection
16:01:29 drbd0: drbd0_receiver [4375]: cstate WFConnection -->
WFReportParams
16:01:29 drbd0: Handshake successful: DRBD Network Protocol version 74
16:01:29 drbd0: Connection established.
16:01:29 drbd0: I am(S): 0:00000002:00000001:00000001:00000001:01
16:01:29 drbd0: Peer(P): 1:00000002:00000001:00000001:00000001:10
16:01:29 drbd0: drbd0_receiver [4375]: cstate WFReportParams --> WFBitMapT
16:01:29 drbd0: Secondary/Unknown --> Secondary/Primary
16:01:30 drbd0: drbd0_receiver [4375]: cstate WFBitMapT --> SyncTarget
16:01:30 drbd0: Resync started as SyncTarget (need to sync 189968596 KB
[47492149 bits set]).
drbdadm dump:
-------------------------------------------------------
resource drbd0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60
; halt -f";
on xen03.emerion.com {
device /dev/drbd0;
disk /dev/sda6;
address 10.1.0.3:7788;
meta-disk /dev/sda5 [0];
}
on xen04.emerion.com {
device /dev/drbd0;
disk /dev/sdb6;
address 10.1.0.4:7788;
meta-disk /dev/sdb5 [0];
}
disk {
on-io-error detach;
}
syncer {
rate 60M;
group 1;
al-extents 257;
}
startup {
wfc-timeout 60;
degr-wfc-timeout 15;
}
}
resource drbd1 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60
; halt -f";
on xen03.emerion.com {
device /dev/drbd1;
disk /dev/sdb6;
address 10.1.0.3:7789;
meta-disk /dev/sdb5 [0];
}
on xen04.emerion.com {
device /dev/drbd1;
disk /dev/sda6;
address 10.1.0.4:7789;
meta-disk /dev/sda5 [0];
}
disk {
on-io-error detach;
}
syncer {
rate 60M;
group 1;
al-extents 257;
}
startup {
wfc-timeout 60;
degr-wfc-timeout 15;
}
}