[DRBD-user] On-line verification problem

David.Livingstone at cn.ca David.Livingstone at cn.ca
Thu May 21 00:44:26 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have a two-node cluster(wimpas1/2) running drbd 8.3.1 which
I just enabled for online verification(verify-alg crc32c).
When I ran it for the first time today I received a number of
"Out-of-sync" messages which I subsequently corrected by
disconnecting and connecting the resource. After some
successful failover tests(running latest heartbeat/pacemaker)
I then ran "drbdadm verify r0" again and to my surprise found
more "Out-of-sync" messages.

The drbd link is good with no reported errors.
Ideas ? While the verify is running changes are
being made to the disk - do I really have errors ?
>From the logs below does the "0 KB (0 bits) marked out-of-sync"
really mean I do not have any errors ?

Thanks


Background :
- 2 x Proliant DL380G5 with SAS drives and
Raid 6 /dev/mapper/VolGroup01-LogVol00 ext3 drbd partition.
- Drbd 8.3.1 built from source.
- Latest RH 5.3 with kernel 2.6.18-128.1.6.el5PAE.
- drbd link is over 10Gb nic : HP NC510C(NetXen) using
nx_nic-3.4.337-1 and nx_lsa-3.4.337-1. The offload
feature(nx_lsa) is not used.

Given below :
1. First test of verify.
2. Second test of verify.
3. drbd.conf


1. First test of verify :

- test verify :
- [root at wimpas2 etc]# drbd-overview
0:r0  Connected Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3 
270G 106G 151G 42%
- Run the verify on resource r0 :
- [root at wimpas2 etc]# drbdadm verify r0
- idle goes from 88% to 79%
- drbd-overview shows verify is ongoing however it does not show the
progress of the verify :
- [root at wimpas2 etc]# drbd-overview
0:r0  VerifyS Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3 270G 
106G 151G 42%
- To show the progress we need to "cat /proc/drbd" :
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1602753144 nr:408740 dw:1603218784 dr:204343509 al:14977667 bm:1160 
lo:139 pe:126 ua:638 ap:25 ep:1 wo:d oos:0
42%      30793593/71661420
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1605444232 nr:408740 dw:1605909876 dr:263949733 al:15006416 bm:1160 
lo:215 pe:499 ua:213 ap:2 ep:1 wo:d oos:0
63%      45596210/71661420
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1609901224 nr:408740 dw:1610366868 dr:364980653 al:15058780 bm:1160 
lo:151 pe:64 ua:146 ap:5 ep:1 wo:d oos:0
98%      70637034/71661420
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1610031364 nr:408740 dw:1610497008 dr:368031373 al:15060068 bm:1160 
lo:160 pe:112 ua:153 ap:7 ep:1 wo:d oos:0
99%      71392390/71661420
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 
2009-04-16 11:32:59
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1610156492 nr:408740 dw:1610622136 dr:369146845 al:15061320 bm:1160 
lo:4 pe:0 ua:0 ap:4 ep:1 wo:d oos:0
[root at wimpas2 etc]# drbd-overview
0:r0  Connected Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3 
270G 108G 148G 43%
[root at wimpas2 etc]#

- from messages log :

May 20 08:36:20 wimpas1 kernel: drbd0: conn( Connected -> VerifyT )
May 20 08:51:42 wimpas1 kernel: drbd0: Out of sync: start=53862808, size=8 
(sectors)
May 20 08:51:47 wimpas1 kernel: drbd0: Out of sync: start=54175416, size=8 
(sectors)
May 20 08:51:50 wimpas1 kernel: drbd0: Out of sync: start=54300672, size=8 
(sectors)
May 20 08:52:05 wimpas1 kernel: drbd0: Out of sync: start=55233848, size=8 
(sectors)
May 20 09:16:22 wimpas1 kernel: drbd0: Out of sync: start=140359968, 
size=8 (sectors)
May 20 11:22:41 wimpas1 kernel: drbd0: Online verify  done (total 9981 
sec; paused 0 sec; 28716 K/sec)
May 20 11:22:41 wimpas1 kernel: drbd0: conn( VerifyT -> Connected )
May 20 11:22:41 wimpas1 kernel: drbd0: Writing the whole bitmap, due to 
failed kmalloc
May 20 11:22:41 wimpas1 kernel: drbd0: 0 KB (0 bits) marked out-of-sync by 
on disk bit-map.

- now on wimpas2 do a :
- drbdadm disconnect r0
- drbdadm connect r0
This should correct out-of-sync blocks :

- on wimpas2 :

[root at wimpas2 etc]# drbdadm disconnect r0
[root at wimpas2 etc]# drbdadm connect r0

- From messages log on wimpas1 :

May 20 11:25:49 wimpas1 kernel: drbd0: peer( Primary -> Unknown ) conn( 
Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
May 20 11:25:49 wimpas1 kernel: drbd0: asender terminated
May 20 11:25:49 wimpas1 kernel: drbd0: Terminating asender thread
May 20 11:25:49 wimpas1 kernel: drbd0: Connection closed
May 20 11:25:49 wimpas1 kernel: drbd0: conn( TearDown -> Unconnected )
May 20 11:25:49 wimpas1 kernel: drbd0: receiver terminated
May 20 11:25:49 wimpas1 kernel: drbd0: Restarting receiver thread
May 20 11:25:49 wimpas1 kernel: drbd0: receiver (re)started
May 20 11:25:49 wimpas1 kernel: drbd0: conn( Unconnected -> WFConnection )
May 20 11:25:56 wimpas1 kernel: drbd0: Handshake successful: Agreed 
network protocol version 89
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFConnection -> 
WFReportParams )
May 20 11:25:56 wimpas1 kernel: drbd0: Starting asender thread (from 
drbd0_receiver [4132])
May 20 11:25:56 wimpas1 kernel: drbd0: data-integrity-alg: <not-used>
May 20 11:25:56 wimpas1 kernel: drbd0: drbd_sync_handshake:
May 20 11:25:56 wimpas1 kernel: drbd0: self 
A33677DBB2985460:0000000000000000:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 bits:0 
flags:0
May 20 11:25:56 wimpas1 kernel: drbd0: peer 
243B1079C9753DEB:A33677DBB2985461:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 
bits:1319 flags:0
May 20 11:25:56 wimpas1 kernel: drbd0: uuid_compare()=-1 by rule 5
May 20 11:25:56 wimpas1 kernel: drbd0: peer( Unknown -> Primary ) conn( 
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID )
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm 
before-resync-target minor-0
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm 
before-resync-target minor-0 exit code 0 (0x0)
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFSyncUUID -> SyncTarget ) 
disk( UpToDate -> Inconsistent )
May 20 11:25:56 wimpas1 kernel: drbd0: Began resync as SyncTarget (will 
sync 5276 KB [1319 bits set]).
May 20 11:25:56 wimpas1 kernel: drbd0: Resync done (total 1 sec; paused 0 
sec; 5276 K/sec)
May 20 11:25:56 wimpas1 kernel: drbd0: conn( SyncTarget -> Connected ) 
disk( Inconsistent -> UpToDate )
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm 
after-resync-target minor-0
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm 
after-resync-target minor-0 exit code 0 (0x0)

- From messages log on wimpas2 :

May 20 11:25:56 wimpas2 kernel: drbd0: conn( StandAlone -> Unconnected )
May 20 11:25:56 wimpas2 kernel: drbd0: Starting receiver thread (from 
drbd0_worker [4143])
May 20 11:25:56 wimpas2 kernel: drbd0: receiver (re)started
May 20 11:25:56 wimpas2 kernel: drbd0: conn( Unconnected -> WFConnection )
May 20 11:25:56 wimpas2 kernel: drbd0: Handshake successful: Agreed 
network protocol version 89
May 20 11:25:56 wimpas2 kernel: drbd0: conn( WFConnection -> 
WFReportParams )
May 20 11:25:56 wimpas2 kernel: drbd0: Starting asender thread (from 
drbd0_receiver [384])
May 20 11:25:56 wimpas2 kernel: drbd0: data-integrity-alg: <not-used>
May 20 11:25:56 wimpas2 kernel: drbd0: drbd_sync_handshake:
May 20 11:25:56 wimpas2 kernel: drbd0: self 
243B1079C9753DEB:A33677DBB2985461:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 
bits:1319 flags:0
May 20 11:25:56 wimpas2 kernel: drbd0: peer 
A33677DBB2985460:0000000000000000:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 bits:0 
flags:0
May 20 11:25:56 wimpas2 kernel: drbd0: uuid_compare()=1 by rule 7
May 20 11:25:56 wimpas2 kernel: drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
May 20 11:25:56 wimpas2 kernel: drbd0: conn( WFBitMapS -> SyncSource ) 
pdsk( UpToDate -> Inconsistent )
May 20 11:25:56 wimpas2 kernel: drbd0: Began resync as SyncSource (will 
sync 5276 KB [1319 bits set]).
May 20 11:25:56 wimpas2 kernel: drbd0: Resync done (total 1 sec; paused 0 
sec; 5276 K/sec)
May 20 11:25:56 wimpas2 kernel: drbd0: conn( SyncSource -> Connected ) 
pdsk( Inconsistent -> UpToDate )



2. Second test of verify.

- [root at wimpas2]# drbdadm verify r0

- on wimpas we are still getting Out of sync :
[root at wimpas2 ~]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r---b
ns:8823180 nr:520920 dw:9344100 dr:213062745 al:100758 bm:591 lo:136 
pe:102 ua:136 ap:0 ep:1 wo:d oos:0
74%      53161685/71661420
[root at wimpas2 ~]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 
2009-04-16 11:32:59
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:11781808 nr:520920 dw:12302728 dr:287072981 al:135343 bm:591 lo:0 pe:0 
ua:0 ap:0 ep:1 wo:d oos:0


- from messages log :

May 20 13:50:53 wimpas1 kernel: drbd0: conn( Connected -> VerifyT )
May 20 14:30:19 wimpas1 kernel: drbd0: Out of sync: start=137739248, 
size=8 (sectors)
May 20 14:35:37 wimpas1 kernel: drbd0: Out of sync: start=156313064, 
size=8 (sectors)
May 20 14:39:00 wimpas1 kernel: drbd0: Out of sync: start=168175112, 
size=8 (sectors)
May 20 14:39:32 wimpas1 kernel: drbd0: Out of sync: start=170062360, 
size=8 (sectors)
May 20 14:45:31 wimpas1 kernel: drbd0: Out of sync: start=190982904, 
size=8 (sectors)
May 20 14:45:45 wimpas1 kernel: drbd0: Out of sync: start=191782000, 
size=8 (sectors)
May 20 14:45:50 wimpas1 kernel: drbd0: Out of sync: start=192029480, 
size=8 (sectors)
May 20 14:47:07 wimpas1 kernel: drbd0: Out of sync: start=196547400, 
size=8 (sectors)
May 20 14:56:18 wimpas1 kernel: drbd0: Out of sync: start=228776888, 
size=8 (sectors)
May 20 14:56:22 wimpas1 kernel: drbd0: Out of sync: start=229022968, 
size=8 (sectors)
May 20 14:56:22 wimpas1 kernel: drbd0: Out of sync: start=229047480, 
size=8 (sectors)
May 20 14:56:24 wimpas1 kernel: drbd0: Out of sync: start=229145752, 
size=8 (sectors)
May 20 14:56:25 wimpas1 kernel: drbd0: Out of sync: start=229198968, 
size=8 (sectors)
May 20 14:56:26 wimpas1 kernel: drbd0: Out of sync: start=229248104, 
size=8 (sectors)
May 20 14:56:27 wimpas1 kernel: drbd0: Out of sync: start=229289008, 
size=8 (sectors)
May 20 16:37:54 wimpas1 kernel: drbd0: Online verify  done (total 10021 
sec; paused 0 sec; 28604 K/sec)
May 20 16:37:54 wimpas1 kernel: drbd0: conn( VerifyT -> Connected )
May 20 16:37:54 wimpas1 kernel: drbd0: Writing the whole bitmap, due to 
failed kmalloc
May 20 16:37:54 wimpas1 kernel: drbd0: 0 KB (0 bits) marked out-of-sync by 
on disk bit-map.


3. drbd.conf

global {
minor-count 1;
}

resource r0 {
protocol C;

on wimpas1 {
device /dev/drbd0; # The name of our drbd device.
disk /dev/mapper/VolGroup01-LogVol00;    # Partition we wish drbd to use.
address 192.168.36.129:7788; # node0 IP address and port number.
meta-disk internal; # Stores meta-data in lower portion of hda5.
}

on wimpas2 {
device /dev/drbd0; # Our drbd device, must match node0.
disk /dev/mapper/VolGroup01-LogVol00;    # Partition we wish drbd to use.
address 192.168.36.130:7788; # node0 IP address and port number.
meta-disk internal; # Stores meta-data in lower portion of hda5.
}

disk {
on-io-error detach; # What to do when the lower level device errors.
}

net {
max-buffers 2048; #datablock buffers used before writing to disk.
ko-count 4; # Peer is dead if this count is exceeded.
#on-disconnect reconnect; # Peer disconnected, try to reconnect.
}

syncer {
rate 29M;
#rate 143M; # Used for first sync
#group 1;  # Used for grouping resources, parallel sync.
al-extents 257; # Must be prime, number of active sets.
verify-alg crc32c;
}

startup {
wfc-timeout 120; # drbd init script will wait 2 minutes - 0 is indefinite.
degr-wfc-timeout 120; # 2 minutes.
}
} # End of resource



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090520/81251415/attachment.htm>


More information about the drbd-user mailing list