Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
I've been using drbd for about 5 years now, and it has been working great.
Recently we modified the setup, moved a couple of servers around, so drbd
should replicate over a 20 Mbit/sec WAN line.
I've changed from Protocol C to A, and enabled Ahead/Behind mode. It seems
to work, but after some time, some of the resources stuck in Ahead/Behind
mode and never resync again unless I disconnect and reconnect the resource.
It looks like this on the Primary:
cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: bb796da897912034a90003910f69ae0a2c10cf44 build by root at node1,
2012-06-04 13:02:39
[..]
13: cs:Ahead ro:Primary/Secondary ds:UpToDate/Inconsistent A r-----
ns:9428820 nr:0 dw:446339296 dr:931364896 al:280851 bm:66801 lo:0 pe:0
ua:0 ap:0 ep:1 wo:n oos:1389708
A minute later:
13: cs:Ahead ro:Primary/Secondary ds:UpToDate/Inconsistent A r-----
ns:9428820 nr:0 dw:446340428 dr:931364948 al:280851 bm:66801 lo:0 pe:0
ua:0 ap:0 ep:1 wo:n oos:1389728
This seems like a bug to me, and it has already been reported by someone
else in August:
http://lists.linbit.com/pipermail/drbd-user/2012-August/018934.html
I've also created a virtualised testsetup with two nodes with 8.4.2, and I
could reach this state, so it is fairly reproducible. The problem seems to
be happening when the node switches from SyncSource to Ahead mode without
finishing synchronization, i.e. I finish some writing to the drbd device,
then wait a few seconds so the node starts to sync, then I start writing
again.
On the productive system it happens on resources which has the most writes.
Any help is appreciated.
Bye.
The configuration:
cat /usr/local/etc/drbd.d/global_common.conf
global {
usage-count no;
}
common {
net {
protocol A;
max-buffers 2048;
max-epoch-size 2048;
verify-alg sha1;
csums-alg sha1;
}
disk {
disk-barrier no;
disk-flushes no;
md-flushes no;
disk-drain no;
al-extents 1801;
}
startup {
wfc-timeout 180;
degr-wfc-timeout 120;
}
}
cat /usr/local/etc/drbd.d/r13.res
resource r13 {
net {
protocol A;
on-congestion pull-ahead;
congestion-fill 200k;
congestion-extents 1620;
}
disk {
c-max-rate 1500k;
}
on node1 {
device /dev/drbd13 minor 13;
disk /dev/sda5;
meta-disk internal;
address ipv4 10.129.164.130:7801;
}
on node2 {
device /dev/drbd13 minor 13;
disk /dev/sdb7;
meta-disk internal;
address ipv4 10.129.166.125:7801;
}
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121209/01d88bea/attachment.htm>