Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a two-node cluster(wimpas1/2) running drbd 8.3.1 which
I just enabled for online verification(verify-alg crc32c).
When I ran it for the first time today I received a number of
"Out-of-sync" messages which I subsequently corrected by
disconnecting and connecting the resource. After some
successful failover tests(running latest heartbeat/pacemaker)
I then ran "drbdadm verify r0" again and to my surprise found
more "Out-of-sync" messages.
The drbd link is good with no reported errors.
Ideas ? While the verify is running changes are
being made to the disk - do I really have errors ?
>From the logs below does the "0 KB (0 bits) marked out-of-sync"
really mean I do not have any errors ?
Thanks
Background :
- 2 x Proliant DL380G5 with SAS drives and
Raid 6 /dev/mapper/VolGroup01-LogVol00 ext3 drbd partition.
- Drbd 8.3.1 built from source.
- Latest RH 5.3 with kernel 2.6.18-128.1.6.el5PAE.
- drbd link is over 10Gb nic : HP NC510C(NetXen) using
nx_nic-3.4.337-1 and nx_lsa-3.4.337-1. The offload
feature(nx_lsa) is not used.
Given below :
1. First test of verify.
2. Second test of verify.
3. drbd.conf
1. First test of verify :
- test verify :
- [root at wimpas2 etc]# drbd-overview
0:r0 Connected Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3
270G 106G 151G 42%
- Run the verify on resource r0 :
- [root at wimpas2 etc]# drbdadm verify r0
- idle goes from 88% to 79%
- drbd-overview shows verify is ongoing however it does not show the
progress of the verify :
- [root at wimpas2 etc]# drbd-overview
0:r0 VerifyS Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3 270G
106G 151G 42%
- To show the progress we need to "cat /proc/drbd" :
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2,
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1602753144 nr:408740 dw:1603218784 dr:204343509 al:14977667 bm:1160
lo:139 pe:126 ua:638 ap:25 ep:1 wo:d oos:0
42% 30793593/71661420
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2,
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1605444232 nr:408740 dw:1605909876 dr:263949733 al:15006416 bm:1160
lo:215 pe:499 ua:213 ap:2 ep:1 wo:d oos:0
63% 45596210/71661420
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2,
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1609901224 nr:408740 dw:1610366868 dr:364980653 al:15058780 bm:1160
lo:151 pe:64 ua:146 ap:5 ep:1 wo:d oos:0
98% 70637034/71661420
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2,
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1610031364 nr:408740 dw:1610497008 dr:368031373 al:15060068 bm:1160
lo:160 pe:112 ua:153 ap:7 ep:1 wo:d oos:0
99% 71392390/71661420
[root at wimpas2 etc]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2,
2009-04-16 11:32:59
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1610156492 nr:408740 dw:1610622136 dr:369146845 al:15061320 bm:1160
lo:4 pe:0 ua:0 ap:4 ep:1 wo:d oos:0
[root at wimpas2 etc]# drbd-overview
0:r0 Connected Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3
270G 108G 148G 43%
[root at wimpas2 etc]#
- from messages log :
May 20 08:36:20 wimpas1 kernel: drbd0: conn( Connected -> VerifyT )
May 20 08:51:42 wimpas1 kernel: drbd0: Out of sync: start=53862808, size=8
(sectors)
May 20 08:51:47 wimpas1 kernel: drbd0: Out of sync: start=54175416, size=8
(sectors)
May 20 08:51:50 wimpas1 kernel: drbd0: Out of sync: start=54300672, size=8
(sectors)
May 20 08:52:05 wimpas1 kernel: drbd0: Out of sync: start=55233848, size=8
(sectors)
May 20 09:16:22 wimpas1 kernel: drbd0: Out of sync: start=140359968,
size=8 (sectors)
May 20 11:22:41 wimpas1 kernel: drbd0: Online verify done (total 9981
sec; paused 0 sec; 28716 K/sec)
May 20 11:22:41 wimpas1 kernel: drbd0: conn( VerifyT -> Connected )
May 20 11:22:41 wimpas1 kernel: drbd0: Writing the whole bitmap, due to
failed kmalloc
May 20 11:22:41 wimpas1 kernel: drbd0: 0 KB (0 bits) marked out-of-sync by
on disk bit-map.
- now on wimpas2 do a :
- drbdadm disconnect r0
- drbdadm connect r0
This should correct out-of-sync blocks :
- on wimpas2 :
[root at wimpas2 etc]# drbdadm disconnect r0
[root at wimpas2 etc]# drbdadm connect r0
- From messages log on wimpas1 :
May 20 11:25:49 wimpas1 kernel: drbd0: peer( Primary -> Unknown ) conn(
Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
May 20 11:25:49 wimpas1 kernel: drbd0: asender terminated
May 20 11:25:49 wimpas1 kernel: drbd0: Terminating asender thread
May 20 11:25:49 wimpas1 kernel: drbd0: Connection closed
May 20 11:25:49 wimpas1 kernel: drbd0: conn( TearDown -> Unconnected )
May 20 11:25:49 wimpas1 kernel: drbd0: receiver terminated
May 20 11:25:49 wimpas1 kernel: drbd0: Restarting receiver thread
May 20 11:25:49 wimpas1 kernel: drbd0: receiver (re)started
May 20 11:25:49 wimpas1 kernel: drbd0: conn( Unconnected -> WFConnection )
May 20 11:25:56 wimpas1 kernel: drbd0: Handshake successful: Agreed
network protocol version 89
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFConnection ->
WFReportParams )
May 20 11:25:56 wimpas1 kernel: drbd0: Starting asender thread (from
drbd0_receiver [4132])
May 20 11:25:56 wimpas1 kernel: drbd0: data-integrity-alg: <not-used>
May 20 11:25:56 wimpas1 kernel: drbd0: drbd_sync_handshake:
May 20 11:25:56 wimpas1 kernel: drbd0: self
A33677DBB2985460:0000000000000000:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 bits:0
flags:0
May 20 11:25:56 wimpas1 kernel: drbd0: peer
243B1079C9753DEB:A33677DBB2985461:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7
bits:1319 flags:0
May 20 11:25:56 wimpas1 kernel: drbd0: uuid_compare()=-1 by rule 5
May 20 11:25:56 wimpas1 kernel: drbd0: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID )
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 0 (0x0)
May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFSyncUUID -> SyncTarget )
disk( UpToDate -> Inconsistent )
May 20 11:25:56 wimpas1 kernel: drbd0: Began resync as SyncTarget (will
sync 5276 KB [1319 bits set]).
May 20 11:25:56 wimpas1 kernel: drbd0: Resync done (total 1 sec; paused 0
sec; 5276 K/sec)
May 20 11:25:56 wimpas1 kernel: drbd0: conn( SyncTarget -> Connected )
disk( Inconsistent -> UpToDate )
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm
after-resync-target minor-0
May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm
after-resync-target minor-0 exit code 0 (0x0)
- From messages log on wimpas2 :
May 20 11:25:56 wimpas2 kernel: drbd0: conn( StandAlone -> Unconnected )
May 20 11:25:56 wimpas2 kernel: drbd0: Starting receiver thread (from
drbd0_worker [4143])
May 20 11:25:56 wimpas2 kernel: drbd0: receiver (re)started
May 20 11:25:56 wimpas2 kernel: drbd0: conn( Unconnected -> WFConnection )
May 20 11:25:56 wimpas2 kernel: drbd0: Handshake successful: Agreed
network protocol version 89
May 20 11:25:56 wimpas2 kernel: drbd0: conn( WFConnection ->
WFReportParams )
May 20 11:25:56 wimpas2 kernel: drbd0: Starting asender thread (from
drbd0_receiver [384])
May 20 11:25:56 wimpas2 kernel: drbd0: data-integrity-alg: <not-used>
May 20 11:25:56 wimpas2 kernel: drbd0: drbd_sync_handshake:
May 20 11:25:56 wimpas2 kernel: drbd0: self
243B1079C9753DEB:A33677DBB2985461:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7
bits:1319 flags:0
May 20 11:25:56 wimpas2 kernel: drbd0: peer
A33677DBB2985460:0000000000000000:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 bits:0
flags:0
May 20 11:25:56 wimpas2 kernel: drbd0: uuid_compare()=1 by rule 7
May 20 11:25:56 wimpas2 kernel: drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
May 20 11:25:56 wimpas2 kernel: drbd0: conn( WFBitMapS -> SyncSource )
pdsk( UpToDate -> Inconsistent )
May 20 11:25:56 wimpas2 kernel: drbd0: Began resync as SyncSource (will
sync 5276 KB [1319 bits set]).
May 20 11:25:56 wimpas2 kernel: drbd0: Resync done (total 1 sec; paused 0
sec; 5276 K/sec)
May 20 11:25:56 wimpas2 kernel: drbd0: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )
2. Second test of verify.
- [root at wimpas2]# drbdadm verify r0
- on wimpas we are still getting Out of sync :
[root at wimpas2 ~]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2,
2009-04-16 11:32:59
0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r---b
ns:8823180 nr:520920 dw:9344100 dr:213062745 al:100758 bm:591 lo:136
pe:102 ua:136 ap:0 ep:1 wo:d oos:0
74% 53161685/71661420
[root at wimpas2 ~]# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2,
2009-04-16 11:32:59
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:11781808 nr:520920 dw:12302728 dr:287072981 al:135343 bm:591 lo:0 pe:0
ua:0 ap:0 ep:1 wo:d oos:0
- from messages log :
May 20 13:50:53 wimpas1 kernel: drbd0: conn( Connected -> VerifyT )
May 20 14:30:19 wimpas1 kernel: drbd0: Out of sync: start=137739248,
size=8 (sectors)
May 20 14:35:37 wimpas1 kernel: drbd0: Out of sync: start=156313064,
size=8 (sectors)
May 20 14:39:00 wimpas1 kernel: drbd0: Out of sync: start=168175112,
size=8 (sectors)
May 20 14:39:32 wimpas1 kernel: drbd0: Out of sync: start=170062360,
size=8 (sectors)
May 20 14:45:31 wimpas1 kernel: drbd0: Out of sync: start=190982904,
size=8 (sectors)
May 20 14:45:45 wimpas1 kernel: drbd0: Out of sync: start=191782000,
size=8 (sectors)
May 20 14:45:50 wimpas1 kernel: drbd0: Out of sync: start=192029480,
size=8 (sectors)
May 20 14:47:07 wimpas1 kernel: drbd0: Out of sync: start=196547400,
size=8 (sectors)
May 20 14:56:18 wimpas1 kernel: drbd0: Out of sync: start=228776888,
size=8 (sectors)
May 20 14:56:22 wimpas1 kernel: drbd0: Out of sync: start=229022968,
size=8 (sectors)
May 20 14:56:22 wimpas1 kernel: drbd0: Out of sync: start=229047480,
size=8 (sectors)
May 20 14:56:24 wimpas1 kernel: drbd0: Out of sync: start=229145752,
size=8 (sectors)
May 20 14:56:25 wimpas1 kernel: drbd0: Out of sync: start=229198968,
size=8 (sectors)
May 20 14:56:26 wimpas1 kernel: drbd0: Out of sync: start=229248104,
size=8 (sectors)
May 20 14:56:27 wimpas1 kernel: drbd0: Out of sync: start=229289008,
size=8 (sectors)
May 20 16:37:54 wimpas1 kernel: drbd0: Online verify done (total 10021
sec; paused 0 sec; 28604 K/sec)
May 20 16:37:54 wimpas1 kernel: drbd0: conn( VerifyT -> Connected )
May 20 16:37:54 wimpas1 kernel: drbd0: Writing the whole bitmap, due to
failed kmalloc
May 20 16:37:54 wimpas1 kernel: drbd0: 0 KB (0 bits) marked out-of-sync by
on disk bit-map.
3. drbd.conf
global {
minor-count 1;
}
resource r0 {
protocol C;
on wimpas1 {
device /dev/drbd0; # The name of our drbd device.
disk /dev/mapper/VolGroup01-LogVol00; # Partition we wish drbd to use.
address 192.168.36.129:7788; # node0 IP address and port number.
meta-disk internal; # Stores meta-data in lower portion of hda5.
}
on wimpas2 {
device /dev/drbd0; # Our drbd device, must match node0.
disk /dev/mapper/VolGroup01-LogVol00; # Partition we wish drbd to use.
address 192.168.36.130:7788; # node0 IP address and port number.
meta-disk internal; # Stores meta-data in lower portion of hda5.
}
disk {
on-io-error detach; # What to do when the lower level device errors.
}
net {
max-buffers 2048; #datablock buffers used before writing to disk.
ko-count 4; # Peer is dead if this count is exceeded.
#on-disconnect reconnect; # Peer disconnected, try to reconnect.
}
syncer {
rate 29M;
#rate 143M; # Used for first sync
#group 1; # Used for grouping resources, parallel sync.
al-extents 257; # Must be prime, number of active sets.
verify-alg crc32c;
}
startup {
wfc-timeout 120; # drbd init script will wait 2 minutes - 0 is indefinite.
degr-wfc-timeout 120; # 2 minutes.
}
} # End of resource
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090520/81251415/attachment.htm>