[Drbd-dev] volume I/O hang on Ahead mode
Kim, SungEun
sekim at mantech.co.kr
Tue Nov 14 05:21:59 CET 2017
Hi,
We tested the asynchronous replication Ahead mode of drbd9(9.0.9) at low
bandwidth (1 ~ 10 Mbps Network) and found that the I/O response time rate
slowed down during replication and eventually I/O of the volume hang
occurred.
Tests according to the following configuration and procedure will reproduce
well.
<drbd.conf>
global {
disable-ip-verification;
usage-count no;
}
common {
startup {
wfc-timeout 1;
}
disk {
resync-rate 100M;
}
net {
on-congestion pull-ahead;
congestion-fill 480M;
verify-alg md5;
}
proxy {
memlimit 500M;
}
}
resource r0 {
protocol A;
disk {
on-io-error detach;
}
device /dev/drbd0;
floating 200.200.2.10:7788 {
disk /dev/sdd1;
meta-disk internal;
proxy on pm1 {
inside 127.0.0.1:7789;
outside 200.200.2.10:7790;
}
}
floating 200.200.2.11:7788 {
disk /dev/sdc1;
meta-disk internal;
proxy on pm2 {
inside 127.0.0.1:7789;
outside 200.200.2.11:7790;
}
}
}
<Test procedure>
1. Proxy Buffer Size: 500M, congestion-fill: 480M;
2. Replication network bandwidth limit of 10Mbps (using VMware's network
bandwidth limiting feature)
3. Mount the /data on the pm1 node
4. dd if=/dev/zero of=/data/test.out bs=100M count=40
5. Enter pm1 ahead mode
6. When ls -l is executed in pm1 node /data directory, the result is not
output and it is almost hang status.
The purpose of this test was to measure the behavior of the drbd9
asynchronous Ahead mode and the I/O response rate of the volume, regardless
of the network bandwidth.
In my opinion, If the number of drbd_req in drbd9 becomes considerably
large (more than tens of drbd_req accumulated in trasfer_log in this test),
the execution time of drbd_sender increases. Especially, it takes much time
to traverse transfer_log in the following code,finally It seems that the
completion time has increased.
// Only the execution time of this logic is measured more than 5 ms
static struct drbd_request *__next_request_for_connection(
struct drbd_connection *connection, struct drbd_request *r)
{
r = list_prepare_entry(r, &connection->resource->transfer_log, tl_requests);
list_for_each_entry_continue(r, &connection->resource->transfer_log,
tl_requests) {
int vnr = r->device->vnr;
struct drbd_peer_device *peer_device = conn_peer_device(connection, vnr);
unsigned s = drbd_req_state_by_peer_device(r, peer_device);
if (!(s & RQ_NET_QUEUED))
continue;
return r;
}
return NULL;
}
Please check this issue and hope that the performance of asynchronous
replication of drbd9 will improve.
Thanks.
Best Regards
from SungEun Kim
[image: 설명: logo]
Technical Research Center / Dev3 Team
Principal Research Engineer
*SungEun Kim sekim at mantech.co.kr <sekim at mantech.co.kr>*
12F, Seoulforest Kolon digital Tower, 308-4, Seongsudong 2ga, Seongdong-gu,
Seoul, Korea
Tel : 02-2136-6913 / Fax : 02-575-4858 / Call Center : 1833-7790
http://www.mantech.co.kr
[image: 본문 이미지 1]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20171114/5b419741/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 16229 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20171114/5b419741/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4544 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20171114/5b419741/attachment-0003.png>
More information about the drbd-dev
mailing list