[DRBD-user] Problem with new setup/0.7.16: stalled sync

Michael Paesold mpaesold at gmx.at
Mon Feb 20 15:24:27 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I am currently setting up a new system with drbd 0.7.16. This is CentOS 4.2, 
Linux xen03.emerion.com 2.6.9-22.0.2.ELsmp #1 SMP i686 i686 i386 GNU/Linux.
on a P4 w/ HT.

The problem is that initial syncing has stalled for one of the devices (but 
not the other one).

Otherwise I have basically done nothing more than
modprobe -s drbd `drbdadm sh-mod-parms`
drbdadm up all
( on both hosts, then on xen03/primary: )
drbdadm -- --do-what-I-say primary all

I then tried to disconnect/connect drbd0 on the primary -- no improvement.

Please see the logs below for some strange error message. Have I done 
something wrong? Have I triggered a (new) bug in DRBD? Tell me if I can 
provide further information. Thank you very much.

Best Regards,
Michael Paesold


Current /proc/drbd on primary:
-------------------------------------------------------
version: 0.7.16 (api:77/proto:74)
SVN Revision: 2066 build by mip at xen04.emerion.com, 2006-02-20 12:07:08
 0: cs:SyncSource st:Primary/Secondary ld:Consistent
    ns:0 nr:0 dw:0 dr:8192 al:0 bm:0 lo:0 pe:0 ua:512 ap:0
        [>...................] sync'ed:  0.1% (185516/185516)M
        stalled
 1: cs:SyncSource st:Primary/Secondary ld:Consistent
    ns:58016996 nr:0 dw:0 dr:58018136 al:0 bm:26730 lo:158 pe:245 ua:285 
ap:0
        [======>.............] sync'ed: 30.6% (128859/185516)M
        finish: 0:45:06 speed: 48,744 (51,844) K/sec

A little bit later on secondary:
-------------------------------------------------------
version: 0.7.16 (api:77/proto:74)
SVN Revision: 2066 build by mip at xen04.emerion.com, 2006-02-20 12:07:08
 0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
    ns:0 nr:0 dw:0 dr:0 al:0 bm:23190 lo:0 pe:1536 ua:0 ap:0
        [>...................] sync'ed:  0.1% (185516/185516)M
        stalled
 1: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
    ns:0 nr:75120848 dw:75120848 dr:0 al:0 bm:27775 lo:0 pe:504 ua:0 ap:0
        [=======>............] sync'ed: 39.6% (112156/185516)M
        finish: 0:28:35 speed: 66,912 (52,020) K/sec


Logs from primary/sync source: (xen03)
-------------------------------------------------------
14:53:18 drbd0: resync bitmap: bits=47492149 words=1484130
14:53:18 drbd0: size = 181 GB (189968593 KB)
14:53:20 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map.
14:53:20 drbd0: Assuming that all blocks are out of sync (aka FullSync)
14:53:22 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map.
14:53:22 drbd0: Couldn't start thread (-4)
14:53:22 drbd0: drbdsetup [3461]: cstate Unconfigured --> StandAlone
14:54:02 drbd0: drbdsetup [3479]: cstate StandAlone --> Unconnected
14:54:02 drbd0: drbdsetup [3479]: cstate Unconnected --> StandAlone
14:54:02 drbd0: drbdsetup [3479]: cstate StandAlone --> Unconfigured
14:54:02 drbd0: _drbd_thread_stop: (thi->task == NULL) in 
/home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581
14:54:03 drbd0: resync bitmap: bits=47492149 words=1484130
14:54:03 drbd0: size = 181 GB (189968593 KB)
14:54:05 drbd0: 181 GB marked out-of-sync by on disk bit-map.
14:54:05 drbd0: No usable activity log found.
14:54:05 drbd0: drbdsetup [3494]: cstate Unconfigured --> StandAlone
14:54:05 drbd1: Creating state block
14:54:05 drbd1: resync bitmap: bits=47492149 words=1484130
14:54:05 drbd1: size = 181 GB (189968593 KB)
14:54:07 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map.
14:54:07 drbd1: Assuming that all blocks are out of sync (aka FullSync)
14:54:09 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map.
14:54:09 drbd1: drbdsetup [3495]: cstate Unconfigured --> StandAlone
14:54:09 drbd0: drbdsetup [3499]: cstate StandAlone --> Unconnected
14:54:09 drbd0: drbd0_receiver [3500]: cstate Unconnected --> WFConnection
14:54:09 drbd1: drbdsetup [3501]: cstate StandAlone --> Unconnected
14:54:09 drbd1: drbd1_receiver [3502]: cstate Unconnected --> WFConnection
14:54:34 drbd0: drbd0_receiver [3500]: cstate WFConnection --> 
WFReportParams
14:54:34 drbd0: Handshake successful: DRBD Network Protocol version 74
14:54:34 drbd0: Connection established.
14:54:34 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:00
14:54:34 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:00
14:54:34 drbd0: drbd0_receiver [3500]: cstate WFReportParams --> Connected
14:54:34 drbd0: I am inconsistent, but there is no sync? BOTH nodes 
inconsistent!
14:54:34 drbd0: Secondary/Unknown --> Secondary/Secondary
14:54:34 drbd1: drbd1_receiver [3502]: cstate WFConnection --> 
WFReportParams
14:54:34 drbd1: Handshake successful: DRBD Network Protocol version 74
14:54:34 drbd1: Connection established.
14:54:34 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:00
14:54:34 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:00
14:54:34 drbd1: drbd1_receiver [3502]: cstate WFReportParams --> Connected
14:54:34 drbd1: I am inconsistent, but there is no sync? BOTH nodes 
inconsistent!
14:54:34 drbd1: Secondary/Unknown --> Secondary/Secondary
14:55:05 drbd0: Secondary/Secondary --> Primary/Secondary
14:55:05 drbd0: I am(P): 1:00000002:00000001:00000001:00000001:11
14:55:05 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:01
14:55:05 drbd0: drbd0_receiver [3500]: cstate Connected --> WFBitMapS
14:55:05 drbd1: Secondary/Secondary --> Primary/Secondary
14:55:05 drbd1: I am(P): 1:00000002:00000001:00000001:00000001:11
14:55:05 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:01
14:55:05 drbd1: drbd1_receiver [3502]: cstate Connected --> WFBitMapS
14:55:05 drbd1: drbd1_receiver [3502]: cstate WFBitMapS --> SyncSource
14:55:05 drbd1: Resync started as SyncSource (need to sync 189968596 KB 
[47492149 bits set]).
14:55:06 drbd0: drbd0_receiver [3500]: cstate WFBitMapS --> SyncSource
14:55:06 drbd0: Resync started as SyncSource (need to sync 189968596 KB 
[47492149 bits set]).
14:59:24 drbd0: drbdsetup [3587]: cstate SyncSource --> Unconnected
14:59:24 drbd0: drbd0_receiver [3500]: cstate Unconnected --> BrokenPipe
14:59:24 drbd0: short read expecting header on sock: r=-512
14:59:24 drbd0: _drbd_thread_stop: (thi->task == NULL) in 
/home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581
14:59:24 drbd0: asender terminated
14:59:24 drbd0: _drbd_thread_stop: (thi->task == NULL) in 
/home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_main.c:581
14:59:24 drbd0: ASSERT( mdev->ee_in_use == 0 ) in 
/home/mip/rpm/BUILD/drbd-0.7.16/drbd/drbd_receiver.c:1881
14:59:24 drbd0: unacked_cnt = 1536
14:59:24 drbd0: drbd0_receiver [3500]: cstate BrokenPipe --> StandAlone
14:59:24 drbd0: Connection lost.
14:59:24 drbd0: receiver terminated
14:59:24 drbd0: drbdsetup [3587]: cstate StandAlone --> StandAlone
14:59:35 drbd0: drbdsetup [3602]: cstate StandAlone --> Unconnected
14:59:35 drbd0: drbd0_receiver [3603]: cstate Unconnected --> WFConnection
14:59:35 drbd0: drbd0_receiver [3603]: cstate WFConnection --> 
WFReportParams
14:59:35 drbd0: Handshake successful: DRBD Network Protocol version 74
14:59:35 drbd0: Connection established.
14:59:35 drbd0: I am(P): 1:00000002:00000001:00000001:00000001:10
14:59:35 drbd0: Peer(S): 0:00000002:00000001:00000001:00000001:01
14:59:35 drbd0: drbd0_receiver [3603]: cstate WFReportParams --> WFBitMapS
14:59:35 drbd0: Primary/Unknown --> Primary/Secondary
14:59:35 drbd0: drbd0_receiver [3603]: cstate WFBitMapS --> SyncSource
14:59:35 drbd0: Resync started as SyncSource (need to sync 189968596 KB 
[47492149 bits set]).
14:59:35 drbd0: resync_work.cb == w_resume_next_sg, should be 
w_resync_inactive

logs from secondary/sync target: (xen04)
-------------------------------------------------------
15:56:20 drbd0: resync bitmap: bits=47492149 words=1484130
15:56:20 drbd0: size = 181 GB (189968593 KB)
15:56:22 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map.
15:56:22 drbd0: Assuming that all blocks are out of sync (aka FullSync)
15:56:24 drbd0: 189968596 KB now marked out-of-sync by on disk bit-map.
15:56:24 drbd0: drbdsetup [4368]: cstate Unconfigured --> StandAlone
15:56:24 drbd1: Creating state block
15:56:24 drbd1: resync bitmap: bits=47492149 words=1484130
15:56:24 drbd1: size = 181 GB (189968593 KB)
15:56:26 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map.
15:56:26 drbd1: Assuming that all blocks are out of sync (aka FullSync)
15:56:28 drbd1: 189968596 KB now marked out-of-sync by on disk bit-map.
15:56:28 drbd1: drbdsetup [4370]: cstate Unconfigured --> StandAlone
15:56:28 drbd0: drbdsetup [4374]: cstate StandAlone --> Unconnected
15:56:28 drbd0: drbd0_receiver [4375]: cstate Unconnected --> WFConnection
15:56:28 drbd1: drbdsetup [4376]: cstate StandAlone --> Unconnected
15:56:28 drbd1: drbd1_receiver [4377]: cstate Unconnected --> WFConnection
15:56:28 drbd0: drbd0_receiver [4375]: cstate WFConnection --> 
WFReportParams
15:56:28 drbd0: Handshake successful: DRBD Network Protocol version 74
15:56:28 drbd0: Connection established.
15:56:28 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:00
15:56:28 drbd0: Peer(S): 0:00000001:00000001:00000001:00000001:00
15:56:28 drbd0: drbd0_receiver [4375]: cstate WFReportParams --> Connected
15:56:28 drbd0: I am inconsistent, but there is no sync? BOTH nodes 
inconsistent!
15:56:28 drbd0: Secondary/Unknown --> Secondary/Secondary
15:56:28 drbd1: drbd1_receiver [4377]: cstate WFConnection --> 
WFReportParams
15:56:28 drbd1: Handshake successful: DRBD Network Protocol version 74
15:56:28 drbd1: Connection established.
15:56:28 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:00
15:56:28 drbd1: Peer(S): 0:00000001:00000001:00000001:00000001:00
15:56:28 drbd1: drbd1_receiver [4377]: cstate WFReportParams --> Connected
15:56:28 drbd1: I am inconsistent, but there is no sync? BOTH nodes 
inconsistent!
15:56:28 drbd1: Secondary/Unknown --> Secondary/Secondary
15:57:00 drbd0: I am(S): 0:00000001:00000001:00000001:00000001:01
15:57:00 drbd0: Peer(P): 1:00000002:00000001:00000001:00000001:11
15:57:00 drbd0: drbd0_receiver [4375]: cstate Connected --> WFBitMapT
15:57:00 drbd0: Secondary/Secondary --> Secondary/Primary
15:57:00 drbd1: I am(S): 0:00000001:00000001:00000001:00000001:01
15:57:00 drbd1: Peer(P): 1:00000002:00000001:00000001:00000001:11
15:57:00 drbd1: drbd1_receiver [4377]: cstate Connected --> WFBitMapT
15:57:00 drbd1: Secondary/Secondary --> Secondary/Primary
15:57:00 drbd1: drbd1_receiver [4377]: cstate WFBitMapT --> SyncTarget
15:57:00 drbd1: Resync started as SyncTarget (need to sync 189968596 KB 
[47492149 bits set]).
15:57:00 drbd0: drbd0_receiver [4375]: cstate WFBitMapT --> SyncTarget
15:57:00 drbd0: Resync started as SyncTarget (need to sync 189968596 KB 
[47492149 bits set]).
16:01:19 drbd0: sock was shut down by peer
16:01:19 drbd0: drbd0_receiver [4375]: cstate SyncTarget --> BrokenPipe
16:01:19 drbd0: short read expecting header on sock: r=0
16:01:19 drbd0: worker terminated
16:01:19 drbd0: meta connection shut down by peer.
16:01:19 drbd0: asender terminated
16:01:19 drbd0: drbd0_receiver [4375]: cstate BrokenPipe --> Unconnected
16:01:19 drbd0: Connection lost.
16:01:19 drbd0: drbd0_receiver [4375]: cstate Unconnected --> WFConnection
16:01:29 drbd0: drbd0_receiver [4375]: cstate WFConnection --> 
WFReportParams
16:01:29 drbd0: Handshake successful: DRBD Network Protocol version 74
16:01:29 drbd0: Connection established.
16:01:29 drbd0: I am(S): 0:00000002:00000001:00000001:00000001:01
16:01:29 drbd0: Peer(P): 1:00000002:00000001:00000001:00000001:10
16:01:29 drbd0: drbd0_receiver [4375]: cstate WFReportParams --> WFBitMapT
16:01:29 drbd0: Secondary/Unknown --> Secondary/Primary
16:01:30 drbd0: drbd0_receiver [4375]: cstate WFBitMapT --> SyncTarget
16:01:30 drbd0: Resync started as SyncTarget (need to sync 189968596 KB 
[47492149 bits set]).



drbdadm dump:
-------------------------------------------------------
resource drbd0 {
    protocol               C;
    incon-degr-cmd       "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 
; halt -f";
    on xen03.emerion.com {
        device           /dev/drbd0;
        disk             /dev/sda6;
        address          10.1.0.3:7788;
        meta-disk        /dev/sda5 [0];
    }
    on xen04.emerion.com {
        device           /dev/drbd0;
        disk             /dev/sdb6;
        address          10.1.0.4:7788;
        meta-disk        /dev/sdb5 [0];
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate             60M;
        group              1;
        al-extents       257;
    }
    startup {
        wfc-timeout       60;
        degr-wfc-timeout  15;
    }
}

resource drbd1 {
    protocol               C;
    incon-degr-cmd       "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 
; halt -f";
    on xen03.emerion.com {
        device           /dev/drbd1;
        disk             /dev/sdb6;
        address          10.1.0.3:7789;
        meta-disk        /dev/sdb5 [0];
    }
    on xen04.emerion.com {
        device           /dev/drbd1;
        disk             /dev/sda6;
        address          10.1.0.4:7789;
        meta-disk        /dev/sda5 [0];
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate             60M;
        group              1;
        al-extents       257;
    }
    startup {
        wfc-timeout       60;
        degr-wfc-timeout  15;
    }
}





More information about the drbd-user mailing list