[DRBD-user] DRBD stalling during resync

Mark Deneen mdeneen at gmail.com
Fri Feb 25 16:49:32 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I've experienced this issue with version 8.3.7, 8.3.9 and 8.3.10.  (I
skipped 8.3.8).

The secondary node thinks that it is fully in sync.  The primary node
shows that the synchronization is not complete:

Primary:


version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by
root at seadragon, 2011-02-24 17:38:48
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:4871556 nr:0 dw:4576232 dr:12678729 al:9009 bm:152 lo:0 pe:0
ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:25792 nr:0 dw:26464 dr:603952 al:0 bm:118 lo:0 pe:0 ua:0 ap:0
ep:1 wo:b oos:0
 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:682468 nr:0 dw:1366296 dr:211741217 al:0 bm:13320 lo:0 pe:0
ua:0 ap:0 ep:1 wo:b oos:192 <--
	[===================>] sync'ed:100.0% (192/15192)K
	finish: 0:00:00 speed: 216 (216) K/sec
 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:5824 nr:0 dw:6560 dr:144321 al:0 bm:37 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:48 <--
 4: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:362304 nr:0 dw:362296 dr:947041 al:616 bm:3 lo:0 pe:0 ua:0 ap:0
ep:1 wo:b oos:0
 5: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:732 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 6: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:39557861 nr:0 dw:39557841 dr:415544 al:10179 bm:2 lo:0 pe:0
ua:0 ap:0 ep:1 wo:b oos:0

Secondary:


version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by
root at scorpion, 2011-02-24 21:02:05
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:3252 dw:3252 dr:668 al:0 bm:10 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:24752 dw:24752 dr:24804 al:0 bm:53 lo:0 pe:0 ua:0 ap:0
ep:1 wo:b oos:0
 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:15168 dw:15168 dr:15192 al:0 bm:61 lo:0 pe:0 ua:0 ap:0
ep:1 wo:b oos:0
 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:4100 dw:4100 dr:4104 al:0 bm:18 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 4: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:276 dw:276 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 5: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 6: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:66297 dw:66297 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Volume 2 is a 200 GB LVM on both the primary or secondary.  I have
seen some things in the logs on the primary, which I would consider
unusual.


[2986137.136457] block drbd2: cs:SyncSource rs_left=76734 >
rs_total=29803 (rs_failed 0)
[2986432.356781] block drbd2: cs:SyncSource rs_left=76979 >
rs_total=29803 (rs_failed 0)
[2986437.293397] block drbd2: cs:SyncSource rs_left=76985 >
rs_total=29803 (rs_failed 0)
[2986438.780070] block drbd2: cs:SyncSource rs_left=76985 >
rs_total=29803 (rs_failed 0)
[2986439.678945] block drbd2: cs:SyncSource rs_left=76985 >
rs_total=29803 (rs_failed 0)


(Please note that these log entries were not taken at the same time as
I captured the output of /proc/drbd)

This may be anecdotal, but I've noticed that the volumes which are
getting out of date are used as raw iscsi targets.  That is, there is
no filesystem on top of the drbd volume, but rather it is used by
another server as a raw scsi disk using scst 2.

This appears to work:
LVM -> DRBD -> file system -> files -> iSCSI (scst 2)

This gets out of sync:
LVM -> DRBD -> iSCSI (scst 2)

Disconnecting and reconnecting the secondary node brings things up to
date, but any new changes are not synchronized.

Am I doing something incorrectly?  Is using a drbd block device as an
iSCSI target a bad thing to do?

-M



More information about the drbd-user mailing list