Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear DRBD,
I am continuing to try and work out the verify on 8.2.5 from the manual:
http://www.drbd.org/users-guide/s-use-online-verify.html
The verify works and finds bad blocks. However I can't seem to clear
them using the suggested method of disconnect and connect. Further
verify commands seem to add bad blocks (on my set-up), which I could
only clear with a full invalidation.
I also did a check on all three suggested hashing algorithms, and found
they all showed the same load and time to complete. Which was also
exactly the same load and time as a full invalidate.
So I am not sure what the advantage of the verify is. With the current
version on 8.2.5, it seems to promote the creation of bad blocks, does
not allow them to clear, and have the same time and loading as a full
resync.
I'm hoping you can tell me that I am doing something completely wrong.
I've included my setup details at the end of the email.
--------------
For instance, I did a verify:
_Secondary Log_
08:42:52 drbd0: conn( Connected -> VerifyT )
09:35:59 drbd0: Out of sync: start=628149776, size=8 (sectors)
10:59:59 drbd0: Online verify done (total 8226 sec; paused 0 sec; 98024
K/sec)
10:59:59 drbd0: Online verify found 1 4k block out of sync!
10:59:59 drbd0: helper command: /sbin/drbdadm out-of-sync
10:59:59 drbd0: Writing the whole bitmap.
10:59:59 drbd0: writing of bitmap took 13 jiffies
10:59:59 drbd0: 4 KB (1 bits) marked out-of-sync by on disk bit-map.
10:59:59 drbd0: conn( VerifyT -> Connected )
_Primary Log_
08:42:52 drbd0: conn( Connected -> VerifyS )
09:35:59 drbd0: Out of sync: start=628149776, size=8 (sectors)
10:59:59 drbd0: Online verify done (total 8226 sec; paused 0 sec; 98024
K/sec)
10:59:59 drbd0: Online verify found 1 4k block out of sync!
10:59:59 drbd0: helper command: /sbin/drbdadm out-of-sync
10:59:59 drbd0: Writing the whole bitmap.
10:59:59 drbd0: writing of bitmap took 21 jiffies
10:59:59 drbd0: 4 KB (1 bits) marked out-of-sync by on disk bit-map.
10:59:59 drbd0: conn( VerifyS -> Connected )
I can see it completed in 2h 17m, which is exactly by syncer rate (100M)
in /etc/drbd.conf.
This has found a bad block. I cleared this using the suggested method
and did another verify:
Primary:
# drbdadm disconnect all
# drbdadm connect all
_Secondary Log_
11:01:58 drbd0: peer( Primary -> Unknown ) conn( Connected -> TearDown )
pdsk( UpToDate -> DUnknown )
11:01:58 drbd0: Writing meta data super block now.
11:01:58 drbd0: meta connection shut down by peer.
11:01:58 drbd0: asender terminated
11:01:58 drbd0: Terminating asender thread
11:01:58 drbd0: tl_clear()
11:01:58 drbd0: Connection closed
11:01:58 drbd0: conn( TearDown -> Unconnected )
11:01:58 drbd0: receiver terminated
11:01:58 drbd0: receiver (re)started
11:01:58 drbd0: conn( Unconnected -> WFConnection )
11:02:09 drbd0: Handshake successful: Agreed network protocol version 88
11:02:09 drbd0: conn( WFConnection -> WFReportParams )
11:02:09 drbd0: Starting asender thread (from drbd0_receiver [25884])
11:02:09 drbd0: data-integrity-alg: <not-used>
11:02:09 drbd0: peer( Unknown -> Primary ) conn( WFReportParams ->
WFBitMapT ) pdsk( DUnknown -> UpToDate )
11:02:09 drbd0: Writing meta data super block now.
11:02:10 drbd0: conn( WFBitMapT -> WFSyncUUID )
11:02:10 drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate ->
Inconsistent )
11:02:10 drbd0: Began resync as SyncTarget (will sync 4848 KB [1212 bits
set]).
11:02:10 drbd0: Writing meta data super block now.
11:02:10 drbd0: Resync done (total 1 sec; paused 0 sec; 4848 K/sec)
11:02:10 drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent ->
UpToDate )
11:02:10 drbd0: Writing meta data super block now.
_Primary Log_
11:01:58 drbd0: peer( Secondary -> Unknown ) conn( Connected ->
Disconnecting ) pdsk( UpToDate -> DUnknown )
11:01:58 drbd0: short read expecting header on sock: r=-512
11:01:58 drbd0: asender terminated
11:01:58 drbd0: Terminating asender thread
11:01:58 drbd0: Creating new current UUID
11:01:58 drbd0: Writing meta data super block now.
11:01:58 drbd0: tl_clear()
11:01:58 drbd0: Connection closed
11:01:58 drbd0: conn( Disconnecting -> StandAlone )
11:01:58 drbd0: receiver terminated
11:01:58 drbd0: Terminating receiver thread
11:02:09 drbd0: conn( StandAlone -> Unconnected )
11:02:09 drbd0: Starting receiver thread (from drbd0_worker [11685])
11:02:09 drbd0: receiver (re)started
11:02:09 drbd0: conn( Unconnected -> WFConnection )
11:02:09 drbd0: Handshake successful: Agreed network protocol version 88
11:02:09 drbd0: conn( WFConnection -> WFReportParams )
11:02:09 drbd0: Starting asender thread (from drbd0_receiver [31085])
11:02:09 drbd0: data-integrity-alg: <not-used>
11:02:09 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams ->
WFBitMapS ) pdsk( DUnknown -> UpToDate )
11:02:09 drbd0: Writing meta data super block now.
11:02:10 drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate ->
Inconsistent )
11:02:10 drbd0: Began resync as SyncSource (will sync 4848 KB [1212 bits
set]).
11:02:10 drbd0: Writing meta data super block now.
11:02:10 drbd0: Resync done (total 1 sec; paused 0 sec; 4848 K/sec)
11:02:10 drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent ->
UpToDate )
11:02:10 drbd0: Writing meta data super block now.
Then verified again:
# drbdadm verify all
_Secondary Log_
11:02:19 drbd0: conn( Connected -> VerifyT )
11:55:29 drbd0: Out of sync: start=628149776, size=8 (sectors)
12:09:22 drbd0: Out of sync: start=791838200, size=8 (sectors)
13:19:31 drbd0: Online verify done (total 8231 sec; paused 0 sec; 97964
K/sec)
13:19:31 drbd0: Online verify found 2 4k block out of sync!
13:19:31 drbd0: helper command: /sbin/drbdadm out-of-sync
13:19:31 drbd0: Writing the whole bitmap.
13:19:31 drbd0: writing of bitmap took 13 jiffies
13:19:31 drbd0: 8 KB (2 bits) marked out-of-sync by on disk bit-map.
13:19:31 drbd0: conn( VerifyT -> Connected )
_Primary Log_
11:02:19 drbd0: conn( Connected -> VerifyS )
11:55:29 drbd0: Out of sync: start=628149776, size=8 (sectors)
12:09:22 drbd0: Out of sync: start=791838200, size=8 (sectors)
13:19:31 drbd0: Online verify done (total 8231 sec; paused 0 sec; 97964
K/sec)
13:19:31 drbd0: Online verify found 2 4k block out of sync!
13:19:31 drbd0: helper command: /sbin/drbdadm out-of-sync
13:19:31 drbd0: Writing the whole bitmap.
13:19:31 drbd0: writing of bitmap took 20 jiffies
13:19:31 drbd0: 8 KB (2 bits) marked out-of-sync by on disk bit-map.
13:19:31 drbd0: conn( VerifyS -> Connected )
I now have two bad blocks where previously I had one.
This was cleared by a invalidate:
Secondary:
# drbdadm invalidate all
After completion a verification from my primary:
# drbdadm verify all
_Secondary Log_
07:58:10 drbd0: conn( Connected -> VerifyT )
10:15:17 drbd0: Online verify done (total 8226 sec; paused 0 sec; 98024
K/sec)
10:15:17 drbd0: conn( VerifyT -> Connected
_Primary Log_
07:58:10 drbd0: conn( Connected -> VerifyS )
10:15:17 drbd0: Online verify done (total 8226 sec; paused 0 sec; 98024
K/sec)
10:15:17 drbd0: conn( VerifyS -> Connected )
However, a few more verification later using different hashing
algorithms, and I am back where I was:
_Secondary Log_
14:28:59 drbd0: conn( Connected -> VerifyT )
15:22:06 drbd0: Out of sync: start=628149776, size=8 (sectors)
15:35:47 drbd0: Out of sync: start=790351000, size=8 (sectors)
16:46:08 drbd0: Online verify done (total 8228 sec; paused 0 sec; 98000
K/sec)
16:46:08 drbd0: Online verify found 2 4k block out of sync!
16:46:08 drbd0: helper command: /sbin/drbdadm out-of-sync
16:46:08 drbd0: Writing the whole bitmap.
16:46:08 drbd0: drbd_bm_total_weight: (!b) in
/usr/src/drbd-8.2.5/drbd/drbd_bitmap.c:429
16:46:08 drbd0: writing of bitmap took 13 jiffies
16:46:08 drbd0: 8 KB (2 bits) marked out-of-sync by on disk bit-map.
16:46:08 drbd0: conn( VerifyT -> Connected )
_Primary Log_
14:28:59 drbd0: conn( Connected -> VerifyS )
15:22:06 drbd0: Out of sync: start=628149776, size=8 (sectors)
15:35:47 drbd0: Out of sync: start=790351000, size=8 (sectors)
16:46:08 drbd0: Online verify done (total 8228 sec; paused 0 sec; 98000
K/sec)
16:46:08 drbd0: Online verify found 2 4k block out of sync!
16:46:08 drbd0: helper command: /sbin/drbdadm out-of-sync
16:46:08 drbd0: Writing the whole bitmap.
16:46:08 drbd0: drbd_bm_total_weight: (!b) in
/usr/src/drbd-8.2.5/drbd/drbd_bitmap.c:429
16:46:08 drbd0: writing of bitmap took 20 jiffies
16:46:08 drbd0: 8 KB (2 bits) marked out-of-sync by on disk bit-map.
16:46:08 drbd0: conn( VerifyS -> Connected )
Not only have I got back my bad-blocks, I have acquiring a
'drbd_bm_total_weight' warning...
I am not sure this is expected behaviour, but I am also not sure that
it's not something I have done. If you have any advise, please let me know.
Also, this is my comparison of the various algorithms:
Check of 787,456 MB over 1000baseT limited to sync rate of 100 MB/sec.
Raw disk benchmark by Bonnie++ at ~ 317MB/sec (input per block).
+------------+-------+-------------+-----------+-----------+
| Algorithm | Time | Rate MB/sec | Pri. Load | Sec. Load |
+------------+-------+-------------+-----------+-----------+
| md5 | 02:17 | 95.8 | 2.46 | 1.37 |
| sha1 | 02:17 | 95.8 | 2.25 | 1.29 |
| crc32c | 02:17 | 95.8 | 2.37 | 1.17 |
| INVALIDATE | 02:16 | 96.5 | 2.64 | 1.24 |
+------------+-------+-------------+-----------+-----------+
Regards,
Ben
common {
net {
max-buffers 40000;
unplug-watermark 40000;
max-epoch-size 16384;
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 100M;
al-extents 257;
verify-alg md5;
}
startup {
degr-wfc-timeout 120;
}
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
pri-lost "echo dbms-04 pri-lost. Have a look at the log
files. | mail -s 'DRBD Alert dbms-04' sysalerts at ...";
split-brain "echo dbms-04 split-brain. drbdadm --
--discard-my-data connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert'
sysalerts at ...";
}
}
resource dbms-04 {
protocol B;
on {
device /dev/drbd0;
disk /dev/cciss/c0d0p4;
address 192.168.95.18:7789;
meta-disk /dev/cciss/c0d0p3 [0];
}
on {
device /dev/drbd0;
disk /dev/cciss/c0d0p4;
address 192.168.95.17:7788;
meta-disk /dev/cciss/c0d0p3 [0];
}
disk {
on-io-error detach;
size 769G;
}
}
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by
root at hp-tm-06, 2008-03-03 11:09:15
0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate B r---
ns:0 nr:6404820 dw:6404820 dr:806354944 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:201539520 misses:49216 starving:0
dirty:0 changed:49216
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
*************************************************************************
This e-mail is confidential and may be legally privileged. It is intended
solely for the use of the individual(s) to whom it is addressed. Any
content in this message is not necessarily a view or statement from Road
Tech Computer Systems Limited but is that of the individual sender. If
you are not the intended recipient, be advised that you have received
this e-mail in error and that any use, dissemination, forwarding,
printing, or copying of this e-mail is strictly prohibited. We use
reasonable endeavours to virus scan all e-mails leaving the company but
no warranty is given that this e-mail and any attachments are virus free.
You should undertake your own virus checking. The right to monitor e-mail
communications through our networks is reserved by us
Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley,
Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17
Registered in England No: 02017435, Registered Address: Charter Court,
Midland Road, Hemel Hempstead, Hertfordshire, HP2 5GE.
*************************************************************************