<br>
<br><font size=2><tt>I have a two-node cluster(wimpas1/2) running drbd
8.3.1 which<br>
I just enabled for online verification(verify-alg crc32c).<br>
When I ran it for the first time today I received a number of<br>
"Out-of-sync" messages which I subsequently corrected by<br>
disconnecting and connecting the resource. After some<br>
successful failover tests(running latest heartbeat/pacemaker)<br>
I then ran "drbdadm verify r0" again and to my surprise found<br>
more "Out-of-sync" messages.</tt></font>
<br>
<br><font size=2><tt>The drbd link is good with no reported errors.<br>
Ideas ? While the verify is running changes are<br>
being made to the disk - do I really have errors ?<br>
>From the logs below does the "0 KB (0 bits) marked out-of-sync"<br>
really mean I do not have any errors ?</tt></font>
<br>
<br><font size=2><tt>Thanks<br>
</tt></font>
<br>
<br><font size=2><tt>Background :<br>
- 2 x Proliant DL380G5 with SAS drives and</tt></font>
<br><font size=2><tt>Raid 6 /dev/mapper/VolGroup01-LogVol00 ext3 drbd partition.<br>
- Drbd 8.3.1 built from source.<br>
- Latest RH 5.3 with kernel 2.6.18-128.1.6.el5PAE.<br>
- drbd link is over 10Gb nic : HP NC510C(NetXen) using</tt></font>
<br><font size=2><tt>nx_nic-3.4.337-1 and nx_lsa-3.4.337-1. The offload<br>
feature(nx_lsa) is not used.</tt></font>
<br>
<br><font size=2><tt>Given below :<br>
1. First test of verify.<br>
2. Second test of verify.<br>
3. drbd.conf</tt></font>
<br>
<br>
<br><font size=2><tt>1. First test of verify :<br>
</tt></font>
<br><font size=2><tt>- test verify :<br>
- [root@wimpas2 etc]# drbd-overview</tt></font>
<br><font size=2><tt>0:r0 Connected Primary/Secondary UpToDate/UpToDate
C r---- /drbd ext3 270G 106G 151G 42%<br>
- Run the verify on resource r0 :</tt></font>
<br><font size=2><tt>- [root@wimpas2 etc]# drbdadm verify r0<br>
- idle goes from 88% to 79%<br>
- drbd-overview shows verify is ongoing however it does not show the</tt></font>
<br><font size=2><tt>progress of the verify :<br>
- [root@wimpas2 etc]# drbd-overview</tt></font>
<br><font size=2><tt>0:r0 VerifyS Primary/Secondary UpToDate/UpToDate
C r---- /drbd ext3 270G 106G 151G 42%<br>
- To show the progress we need to "cat /proc/drbd" :</tt></font>
<br><font size=2><tt>[root@wimpas2 etc]# cat /proc/drbd<br>
version: 8.3.1 (api:88/proto:86-89)<br>
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root@wimpas2,
2009-04-16 11:32:59</tt></font>
<br><font size=2><tt>0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate
C r----<br>
ns:1602753144 nr:408740 dw:1603218784 dr:204343509 al:14977667 bm:1160
lo:139 pe:126 ua:638 ap:25 ep:1 wo:d oos:0</tt></font>
<br><font size=2><tt>42% 30793593/71661420<br>
[root@wimpas2 etc]# cat /proc/drbd<br>
version: 8.3.1 (api:88/proto:86-89)<br>
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root@wimpas2,
2009-04-16 11:32:59</tt></font>
<br><font size=2><tt>0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate
C r----<br>
ns:1605444232 nr:408740 dw:1605909876 dr:263949733 al:15006416 bm:1160
lo:215 pe:499 ua:213 ap:2 ep:1 wo:d oos:0</tt></font>
<br><font size=2><tt>63% 45596210/71661420<br>
[root@wimpas2 etc]# cat /proc/drbd<br>
version: 8.3.1 (api:88/proto:86-89)<br>
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root@wimpas2,
2009-04-16 11:32:59</tt></font>
<br><font size=2><tt>0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate
C r----<br>
ns:1609901224 nr:408740 dw:1610366868 dr:364980653 al:15058780 bm:1160
lo:151 pe:64 ua:146 ap:5 ep:1 wo:d oos:0</tt></font>
<br><font size=2><tt>98% 70637034/71661420<br>
[root@wimpas2 etc]# cat /proc/drbd<br>
version: 8.3.1 (api:88/proto:86-89)<br>
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root@wimpas2,
2009-04-16 11:32:59</tt></font>
<br><font size=2><tt>0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate
C r----<br>
ns:1610031364 nr:408740 dw:1610497008 dr:368031373 al:15060068 bm:1160
lo:160 pe:112 ua:153 ap:7 ep:1 wo:d oos:0</tt></font>
<br><font size=2><tt>99% 71392390/71661420<br>
[root@wimpas2 etc]# cat /proc/drbd<br>
version: 8.3.1 (api:88/proto:86-89)<br>
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root@wimpas2,
2009-04-16 11:32:59</tt></font>
<br><font size=2><tt>0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate
C r----<br>
ns:1610156492 nr:408740 dw:1610622136 dr:369146845 al:15061320 bm:1160
lo:4 pe:0 ua:0 ap:4 ep:1 wo:d oos:0</tt></font>
<br><font size=2><tt>[root@wimpas2 etc]# drbd-overview<br>
0:r0 Connected Primary/Secondary UpToDate/UpToDate C r---- /drbd
ext3 270G 108G 148G 43%</tt></font>
<br><font size=2><tt>[root@wimpas2 etc]#<br>
</tt></font>
<br><font size=2><tt>- from messages log :<br>
</tt></font>
<br><font size=2><tt>May 20 08:36:20 wimpas1 kernel: drbd0: conn( Connected
-> VerifyT )<br>
May 20 08:51:42 wimpas1 kernel: drbd0: Out of sync: start=53862808, size=8
(sectors)<br>
May 20 08:51:47 wimpas1 kernel: drbd0: Out of sync: start=54175416, size=8
(sectors)<br>
May 20 08:51:50 wimpas1 kernel: drbd0: Out of sync: start=54300672, size=8
(sectors)<br>
May 20 08:52:05 wimpas1 kernel: drbd0: Out of sync: start=55233848, size=8
(sectors)<br>
May 20 09:16:22 wimpas1 kernel: drbd0: Out of sync: start=140359968, size=8
(sectors)<br>
May 20 11:22:41 wimpas1 kernel: drbd0: Online verify done (total
9981 sec; paused 0 sec; 28716 K/sec)<br>
May 20 11:22:41 wimpas1 kernel: drbd0: conn( VerifyT -> Connected )<br>
May 20 11:22:41 wimpas1 kernel: drbd0: Writing the whole bitmap, due to
failed kmalloc<br>
May 20 11:22:41 wimpas1 kernel: drbd0: 0 KB (0 bits) marked out-of-sync
by on disk bit-map.<br>
</tt></font>
<br><font size=2><tt>- now on wimpas2 do a :<br>
- drbdadm disconnect r0<br>
- drbdadm connect r0<br>
This should correct out-of-sync blocks :</tt></font>
<br>
<br><font size=2><tt>- on wimpas2 :<br>
</tt></font>
<br><font size=2><tt>[root@wimpas2 etc]# drbdadm disconnect r0<br>
[root@wimpas2 etc]# drbdadm connect r0<br>
</tt></font>
<br><font size=2><tt>- From messages log on wimpas1 :<br>
</tt></font>
<br><font size=2><tt>May 20 11:25:49 wimpas1 kernel: drbd0: peer( Primary
-> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown
)<br>
May 20 11:25:49 wimpas1 kernel: drbd0: asender terminated<br>
May 20 11:25:49 wimpas1 kernel: drbd0: Terminating asender thread<br>
May 20 11:25:49 wimpas1 kernel: drbd0: Connection closed<br>
May 20 11:25:49 wimpas1 kernel: drbd0: conn( TearDown -> Unconnected
)<br>
May 20 11:25:49 wimpas1 kernel: drbd0: receiver terminated<br>
May 20 11:25:49 wimpas1 kernel: drbd0: Restarting receiver thread<br>
May 20 11:25:49 wimpas1 kernel: drbd0: receiver (re)started<br>
May 20 11:25:49 wimpas1 kernel: drbd0: conn( Unconnected -> WFConnection
)<br>
May 20 11:25:56 wimpas1 kernel: drbd0: Handshake successful: Agreed network
protocol version 89<br>
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFConnection -> WFReportParams
)<br>
May 20 11:25:56 wimpas1 kernel: drbd0: Starting asender thread (from drbd0_receiver
[4132])<br>
May 20 11:25:56 wimpas1 kernel: drbd0: data-integrity-alg: <not-used><br>
May 20 11:25:56 wimpas1 kernel: drbd0: drbd_sync_handshake:<br>
May 20 11:25:56 wimpas1 kernel: drbd0: self A33677DBB2985460:0000000000000000:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7
bits:0 flags:0<br>
May 20 11:25:56 wimpas1 kernel: drbd0: peer 243B1079C9753DEB:A33677DBB2985461:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7
bits:1319 flags:0<br>
May 20 11:25:56 wimpas1 kernel: drbd0: uuid_compare()=-1 by rule 5<br>
May 20 11:25:56 wimpas1 kernel: drbd0: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )<br>
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID
)<br>
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target
minor-0<br>
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target
minor-0 exit code 0 (0x0)<br>
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFSyncUUID -> SyncTarget
) disk( UpToDate -> Inconsistent )<br>
May 20 11:25:56 wimpas1 kernel: drbd0: Began resync as SyncTarget (will
sync 5276 KB [1319 bits set]).<br>
May 20 11:25:56 wimpas1 kernel: drbd0: Resync done (total 1 sec; paused
0 sec; 5276 K/sec)<br>
May 20 11:25:56 wimpas1 kernel: drbd0: conn( SyncTarget -> Connected
) disk( Inconsistent -> UpToDate )<br>
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm after-resync-target
minor-0<br>
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm after-resync-target
minor-0 exit code 0 (0x0)<br>
</tt></font>
<br><font size=2><tt>- From messages log on wimpas2 :<br>
</tt></font>
<br><font size=2><tt>May 20 11:25:56 wimpas2 kernel: drbd0: conn( StandAlone
-> Unconnected )<br>
May 20 11:25:56 wimpas2 kernel: drbd0: Starting receiver thread (from drbd0_worker
[4143])<br>
May 20 11:25:56 wimpas2 kernel: drbd0: receiver (re)started<br>
May 20 11:25:56 wimpas2 kernel: drbd0: conn( Unconnected -> WFConnection
)<br>
May 20 11:25:56 wimpas2 kernel: drbd0: Handshake successful: Agreed network
protocol version 89<br>
May 20 11:25:56 wimpas2 kernel: drbd0: conn( WFConnection -> WFReportParams
)<br>
May 20 11:25:56 wimpas2 kernel: drbd0: Starting asender thread (from drbd0_receiver
[384])<br>
May 20 11:25:56 wimpas2 kernel: drbd0: data-integrity-alg: <not-used><br>
May 20 11:25:56 wimpas2 kernel: drbd0: drbd_sync_handshake:<br>
May 20 11:25:56 wimpas2 kernel: drbd0: self 243B1079C9753DEB:A33677DBB2985461:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7
bits:1319 flags:0<br>
May 20 11:25:56 wimpas2 kernel: drbd0: peer A33677DBB2985460:0000000000000000:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7
bits:0 flags:0<br>
May 20 11:25:56 wimpas2 kernel: drbd0: uuid_compare()=1 by rule 7<br>
May 20 11:25:56 wimpas2 kernel: drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )<br>
May 20 11:25:56 wimpas2 kernel: drbd0: conn( WFBitMapS -> SyncSource
) pdsk( UpToDate -> Inconsistent )<br>
May 20 11:25:56 wimpas2 kernel: drbd0: Began resync as SyncSource (will
sync 5276 KB [1319 bits set]).<br>
May 20 11:25:56 wimpas2 kernel: drbd0: Resync done (total 1 sec; paused
0 sec; 5276 K/sec)<br>
May 20 11:25:56 wimpas2 kernel: drbd0: conn( SyncSource -> Connected
) pdsk( Inconsistent -> UpToDate )<br>
</tt></font>
<br>
<br>
<br><font size=2><tt>2. Second test of verify.<br>
</tt></font>
<br><font size=2><tt>- [root@wimpas2]# drbdadm verify r0<br>
</tt></font>
<br><font size=2><tt>- on wimpas we are still getting Out of sync :<br>
[root@wimpas2 ~]# cat /proc/drbd<br>
version: 8.3.1 (api:88/proto:86-89)<br>
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root@wimpas2,
2009-04-16 11:32:59</tt></font>
<br><font size=2><tt>0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate
C r---b<br>
ns:8823180 nr:520920 dw:9344100 dr:213062745 al:100758 bm:591 lo:136 pe:102
ua:136 ap:0 ep:1 wo:d oos:0</tt></font>
<br><font size=2><tt>74% 53161685/71661420<br>
[root@wimpas2 ~]# cat /proc/drbd<br>
version: 8.3.1 (api:88/proto:86-89)<br>
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root@wimpas2,
2009-04-16 11:32:59</tt></font>
<br><font size=2><tt>0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate
C r----<br>
ns:11781808 nr:520920 dw:12302728 dr:287072981 al:135343 bm:591 lo:0 pe:0
ua:0 ap:0 ep:1 wo:d oos:0</tt></font>
<br>
<br>
<br><font size=2><tt>- from messages log :<br>
</tt></font>
<br><font size=2><tt>May 20 13:50:53 wimpas1 kernel: drbd0: conn( Connected
-> VerifyT )<br>
May 20 14:30:19 wimpas1 kernel: drbd0: Out of sync: start=137739248, size=8
(sectors)<br>
May 20 14:35:37 wimpas1 kernel: drbd0: Out of sync: start=156313064, size=8
(sectors)<br>
May 20 14:39:00 wimpas1 kernel: drbd0: Out of sync: start=168175112, size=8
(sectors)<br>
May 20 14:39:32 wimpas1 kernel: drbd0: Out of sync: start=170062360, size=8
(sectors)<br>
May 20 14:45:31 wimpas1 kernel: drbd0: Out of sync: start=190982904, size=8
(sectors)<br>
May 20 14:45:45 wimpas1 kernel: drbd0: Out of sync: start=191782000, size=8
(sectors)<br>
May 20 14:45:50 wimpas1 kernel: drbd0: Out of sync: start=192029480, size=8
(sectors)<br>
May 20 14:47:07 wimpas1 kernel: drbd0: Out of sync: start=196547400, size=8
(sectors)<br>
May 20 14:56:18 wimpas1 kernel: drbd0: Out of sync: start=228776888, size=8
(sectors)<br>
May 20 14:56:22 wimpas1 kernel: drbd0: Out of sync: start=229022968, size=8
(sectors)<br>
May 20 14:56:22 wimpas1 kernel: drbd0: Out of sync: start=229047480, size=8
(sectors)<br>
May 20 14:56:24 wimpas1 kernel: drbd0: Out of sync: start=229145752, size=8
(sectors)<br>
May 20 14:56:25 wimpas1 kernel: drbd0: Out of sync: start=229198968, size=8
(sectors)<br>
May 20 14:56:26 wimpas1 kernel: drbd0: Out of sync: start=229248104, size=8
(sectors)<br>
May 20 14:56:27 wimpas1 kernel: drbd0: Out of sync: start=229289008, size=8
(sectors)<br>
May 20 16:37:54 wimpas1 kernel: drbd0: Online verify done (total
10021 sec; paused 0 sec; 28604 K/sec)<br>
May 20 16:37:54 wimpas1 kernel: drbd0: conn( VerifyT -> Connected )<br>
May 20 16:37:54 wimpas1 kernel: drbd0: Writing the whole bitmap, due to
failed kmalloc<br>
May 20 16:37:54 wimpas1 kernel: drbd0: 0 KB (0 bits) marked out-of-sync
by on disk bit-map.<br>
</tt></font>
<br>
<br><font size=2><tt>3. drbd.conf<br>
</tt></font>
<br><font size=2><tt>global {<br>
minor-count 1;<br>
}<br>
</tt></font>
<br><font size=2><tt>resource r0 {<br>
protocol C;<br>
</tt></font>
<br><font size=2><tt>on wimpas1 {<br>
device /dev/drbd0; # The name of our drbd device.<br>
disk /dev/mapper/VolGroup01-LogVol00; # Partition we wish
drbd to use.<br>
address 192.168.36.129:7788; # node0 IP address and port number.<br>
meta-disk internal; # Stores meta-data in lower portion of hda5.<br>
}<br>
</tt></font>
<br><font size=2><tt>on wimpas2 {<br>
device /dev/drbd0; # Our drbd device, must match node0.<br>
disk /dev/mapper/VolGroup01-LogVol00; # Partition we wish
drbd to use.<br>
address 192.168.36.130:7788; # node0 IP address and port number.<br>
meta-disk internal; # Stores meta-data in lower portion of hda5.<br>
}<br>
</tt></font>
<br><font size=2><tt>disk {<br>
on-io-error detach; # What to do when the lower level device errors.<br>
}<br>
</tt></font>
<br><font size=2><tt>net {<br>
max-buffers 2048; #datablock buffers used before writing to disk.<br>
ko-count 4; # Peer is dead if this count is exceeded.<br>
#on-disconnect reconnect; # Peer disconnected, try to reconnect.<br>
}<br>
</tt></font>
<br><font size=2><tt>syncer {<br>
rate 29M;<br>
#rate 143M; # Used for first sync<br>
#group 1; # Used for grouping resources, parallel sync.<br>
al-extents 257; # Must be prime, number of active sets.<br>
verify-alg crc32c;<br>
}<br>
</tt></font>
<br><font size=2><tt>startup {<br>
wfc-timeout 120; # drbd init script will wait 2 minutes - 0 is indefinite.<br>
degr-wfc-timeout 120; # 2 minutes.<br>
}<br>
} # End of resource</tt></font>
<br>
<br>
<br>
<br>