Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a two-node cluster(wimpas1/2) running drbd 8.3.1 which I just enabled for online verification(verify-alg crc32c). When I ran it for the first time today I received a number of "Out-of-sync" messages which I subsequently corrected by disconnecting and connecting the resource. After some successful failover tests(running latest heartbeat/pacemaker) I then ran "drbdadm verify r0" again and to my surprise found more "Out-of-sync" messages. The drbd link is good with no reported errors. Ideas ? While the verify is running changes are being made to the disk - do I really have errors ? >From the logs below does the "0 KB (0 bits) marked out-of-sync" really mean I do not have any errors ? Thanks Background : - 2 x Proliant DL380G5 with SAS drives and Raid 6 /dev/mapper/VolGroup01-LogVol00 ext3 drbd partition. - Drbd 8.3.1 built from source. - Latest RH 5.3 with kernel 2.6.18-128.1.6.el5PAE. - drbd link is over 10Gb nic : HP NC510C(NetXen) using nx_nic-3.4.337-1 and nx_lsa-3.4.337-1. The offload feature(nx_lsa) is not used. Given below : 1. First test of verify. 2. Second test of verify. 3. drbd.conf 1. First test of verify : - test verify : - [root at wimpas2 etc]# drbd-overview 0:r0 Connected Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3 270G 106G 151G 42% - Run the verify on resource r0 : - [root at wimpas2 etc]# drbdadm verify r0 - idle goes from 88% to 79% - drbd-overview shows verify is ongoing however it does not show the progress of the verify : - [root at wimpas2 etc]# drbd-overview 0:r0 VerifyS Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3 270G 106G 151G 42% - To show the progress we need to "cat /proc/drbd" : [root at wimpas2 etc]# cat /proc/drbd version: 8.3.1 (api:88/proto:86-89) GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 2009-04-16 11:32:59 0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:1602753144 nr:408740 dw:1603218784 dr:204343509 al:14977667 bm:1160 lo:139 pe:126 ua:638 ap:25 ep:1 wo:d oos:0 42% 30793593/71661420 [root at wimpas2 etc]# cat /proc/drbd version: 8.3.1 (api:88/proto:86-89) GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 2009-04-16 11:32:59 0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:1605444232 nr:408740 dw:1605909876 dr:263949733 al:15006416 bm:1160 lo:215 pe:499 ua:213 ap:2 ep:1 wo:d oos:0 63% 45596210/71661420 [root at wimpas2 etc]# cat /proc/drbd version: 8.3.1 (api:88/proto:86-89) GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 2009-04-16 11:32:59 0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:1609901224 nr:408740 dw:1610366868 dr:364980653 al:15058780 bm:1160 lo:151 pe:64 ua:146 ap:5 ep:1 wo:d oos:0 98% 70637034/71661420 [root at wimpas2 etc]# cat /proc/drbd version: 8.3.1 (api:88/proto:86-89) GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 2009-04-16 11:32:59 0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:1610031364 nr:408740 dw:1610497008 dr:368031373 al:15060068 bm:1160 lo:160 pe:112 ua:153 ap:7 ep:1 wo:d oos:0 99% 71392390/71661420 [root at wimpas2 etc]# cat /proc/drbd version: 8.3.1 (api:88/proto:86-89) GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 2009-04-16 11:32:59 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:1610156492 nr:408740 dw:1610622136 dr:369146845 al:15061320 bm:1160 lo:4 pe:0 ua:0 ap:4 ep:1 wo:d oos:0 [root at wimpas2 etc]# drbd-overview 0:r0 Connected Primary/Secondary UpToDate/UpToDate C r---- /drbd ext3 270G 108G 148G 43% [root at wimpas2 etc]# - from messages log : May 20 08:36:20 wimpas1 kernel: drbd0: conn( Connected -> VerifyT ) May 20 08:51:42 wimpas1 kernel: drbd0: Out of sync: start=53862808, size=8 (sectors) May 20 08:51:47 wimpas1 kernel: drbd0: Out of sync: start=54175416, size=8 (sectors) May 20 08:51:50 wimpas1 kernel: drbd0: Out of sync: start=54300672, size=8 (sectors) May 20 08:52:05 wimpas1 kernel: drbd0: Out of sync: start=55233848, size=8 (sectors) May 20 09:16:22 wimpas1 kernel: drbd0: Out of sync: start=140359968, size=8 (sectors) May 20 11:22:41 wimpas1 kernel: drbd0: Online verify done (total 9981 sec; paused 0 sec; 28716 K/sec) May 20 11:22:41 wimpas1 kernel: drbd0: conn( VerifyT -> Connected ) May 20 11:22:41 wimpas1 kernel: drbd0: Writing the whole bitmap, due to failed kmalloc May 20 11:22:41 wimpas1 kernel: drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. - now on wimpas2 do a : - drbdadm disconnect r0 - drbdadm connect r0 This should correct out-of-sync blocks : - on wimpas2 : [root at wimpas2 etc]# drbdadm disconnect r0 [root at wimpas2 etc]# drbdadm connect r0 - From messages log on wimpas1 : May 20 11:25:49 wimpas1 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) May 20 11:25:49 wimpas1 kernel: drbd0: asender terminated May 20 11:25:49 wimpas1 kernel: drbd0: Terminating asender thread May 20 11:25:49 wimpas1 kernel: drbd0: Connection closed May 20 11:25:49 wimpas1 kernel: drbd0: conn( TearDown -> Unconnected ) May 20 11:25:49 wimpas1 kernel: drbd0: receiver terminated May 20 11:25:49 wimpas1 kernel: drbd0: Restarting receiver thread May 20 11:25:49 wimpas1 kernel: drbd0: receiver (re)started May 20 11:25:49 wimpas1 kernel: drbd0: conn( Unconnected -> WFConnection ) May 20 11:25:56 wimpas1 kernel: drbd0: Handshake successful: Agreed network protocol version 89 May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFConnection -> WFReportParams ) May 20 11:25:56 wimpas1 kernel: drbd0: Starting asender thread (from drbd0_receiver [4132]) May 20 11:25:56 wimpas1 kernel: drbd0: data-integrity-alg: <not-used> May 20 11:25:56 wimpas1 kernel: drbd0: drbd_sync_handshake: May 20 11:25:56 wimpas1 kernel: drbd0: self A33677DBB2985460:0000000000000000:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 bits:0 flags:0 May 20 11:25:56 wimpas1 kernel: drbd0: peer 243B1079C9753DEB:A33677DBB2985461:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 bits:1319 flags:0 May 20 11:25:56 wimpas1 kernel: drbd0: uuid_compare()=-1 by rule 5 May 20 11:25:56 wimpas1 kernel: drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID ) May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) May 20 11:25:56 wimpas1 kernel: drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) May 20 11:25:56 wimpas1 kernel: drbd0: Began resync as SyncTarget (will sync 5276 KB [1319 bits set]). May 20 11:25:56 wimpas1 kernel: drbd0: Resync done (total 1 sec; paused 0 sec; 5276 K/sec) May 20 11:25:56 wimpas1 kernel: drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 May 20 11:25:56 wimpas1 kernel: drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) - From messages log on wimpas2 : May 20 11:25:56 wimpas2 kernel: drbd0: conn( StandAlone -> Unconnected ) May 20 11:25:56 wimpas2 kernel: drbd0: Starting receiver thread (from drbd0_worker [4143]) May 20 11:25:56 wimpas2 kernel: drbd0: receiver (re)started May 20 11:25:56 wimpas2 kernel: drbd0: conn( Unconnected -> WFConnection ) May 20 11:25:56 wimpas2 kernel: drbd0: Handshake successful: Agreed network protocol version 89 May 20 11:25:56 wimpas2 kernel: drbd0: conn( WFConnection -> WFReportParams ) May 20 11:25:56 wimpas2 kernel: drbd0: Starting asender thread (from drbd0_receiver [384]) May 20 11:25:56 wimpas2 kernel: drbd0: data-integrity-alg: <not-used> May 20 11:25:56 wimpas2 kernel: drbd0: drbd_sync_handshake: May 20 11:25:56 wimpas2 kernel: drbd0: self 243B1079C9753DEB:A33677DBB2985461:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 bits:1319 flags:0 May 20 11:25:56 wimpas2 kernel: drbd0: peer A33677DBB2985460:0000000000000000:6D2B4B9CDDBDEFD6:37F8B9BC0B605BA7 bits:0 flags:0 May 20 11:25:56 wimpas2 kernel: drbd0: uuid_compare()=1 by rule 7 May 20 11:25:56 wimpas2 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) May 20 11:25:56 wimpas2 kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) May 20 11:25:56 wimpas2 kernel: drbd0: Began resync as SyncSource (will sync 5276 KB [1319 bits set]). May 20 11:25:56 wimpas2 kernel: drbd0: Resync done (total 1 sec; paused 0 sec; 5276 K/sec) May 20 11:25:56 wimpas2 kernel: drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 2. Second test of verify. - [root at wimpas2]# drbdadm verify r0 - on wimpas we are still getting Out of sync : [root at wimpas2 ~]# cat /proc/drbd version: 8.3.1 (api:88/proto:86-89) GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 2009-04-16 11:32:59 0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r---b ns:8823180 nr:520920 dw:9344100 dr:213062745 al:100758 bm:591 lo:136 pe:102 ua:136 ap:0 ep:1 wo:d oos:0 74% 53161685/71661420 [root at wimpas2 ~]# cat /proc/drbd version: 8.3.1 (api:88/proto:86-89) GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by root at wimpas2, 2009-04-16 11:32:59 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:11781808 nr:520920 dw:12302728 dr:287072981 al:135343 bm:591 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 - from messages log : May 20 13:50:53 wimpas1 kernel: drbd0: conn( Connected -> VerifyT ) May 20 14:30:19 wimpas1 kernel: drbd0: Out of sync: start=137739248, size=8 (sectors) May 20 14:35:37 wimpas1 kernel: drbd0: Out of sync: start=156313064, size=8 (sectors) May 20 14:39:00 wimpas1 kernel: drbd0: Out of sync: start=168175112, size=8 (sectors) May 20 14:39:32 wimpas1 kernel: drbd0: Out of sync: start=170062360, size=8 (sectors) May 20 14:45:31 wimpas1 kernel: drbd0: Out of sync: start=190982904, size=8 (sectors) May 20 14:45:45 wimpas1 kernel: drbd0: Out of sync: start=191782000, size=8 (sectors) May 20 14:45:50 wimpas1 kernel: drbd0: Out of sync: start=192029480, size=8 (sectors) May 20 14:47:07 wimpas1 kernel: drbd0: Out of sync: start=196547400, size=8 (sectors) May 20 14:56:18 wimpas1 kernel: drbd0: Out of sync: start=228776888, size=8 (sectors) May 20 14:56:22 wimpas1 kernel: drbd0: Out of sync: start=229022968, size=8 (sectors) May 20 14:56:22 wimpas1 kernel: drbd0: Out of sync: start=229047480, size=8 (sectors) May 20 14:56:24 wimpas1 kernel: drbd0: Out of sync: start=229145752, size=8 (sectors) May 20 14:56:25 wimpas1 kernel: drbd0: Out of sync: start=229198968, size=8 (sectors) May 20 14:56:26 wimpas1 kernel: drbd0: Out of sync: start=229248104, size=8 (sectors) May 20 14:56:27 wimpas1 kernel: drbd0: Out of sync: start=229289008, size=8 (sectors) May 20 16:37:54 wimpas1 kernel: drbd0: Online verify done (total 10021 sec; paused 0 sec; 28604 K/sec) May 20 16:37:54 wimpas1 kernel: drbd0: conn( VerifyT -> Connected ) May 20 16:37:54 wimpas1 kernel: drbd0: Writing the whole bitmap, due to failed kmalloc May 20 16:37:54 wimpas1 kernel: drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. 3. drbd.conf global { minor-count 1; } resource r0 { protocol C; on wimpas1 { device /dev/drbd0; # The name of our drbd device. disk /dev/mapper/VolGroup01-LogVol00; # Partition we wish drbd to use. address 192.168.36.129:7788; # node0 IP address and port number. meta-disk internal; # Stores meta-data in lower portion of hda5. } on wimpas2 { device /dev/drbd0; # Our drbd device, must match node0. disk /dev/mapper/VolGroup01-LogVol00; # Partition we wish drbd to use. address 192.168.36.130:7788; # node0 IP address and port number. meta-disk internal; # Stores meta-data in lower portion of hda5. } disk { on-io-error detach; # What to do when the lower level device errors. } net { max-buffers 2048; #datablock buffers used before writing to disk. ko-count 4; # Peer is dead if this count is exceeded. #on-disconnect reconnect; # Peer disconnected, try to reconnect. } syncer { rate 29M; #rate 143M; # Used for first sync #group 1; # Used for grouping resources, parallel sync. al-extents 257; # Must be prime, number of active sets. verify-alg crc32c; } startup { wfc-timeout 120; # drbd init script will wait 2 minutes - 0 is indefinite. degr-wfc-timeout 120; # 2 minutes. } } # End of resource -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090520/81251415/attachment.htm>