Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello
I found the following knowledge base article from 2011 which described my
exact problem: http://www.novell.com/support/kb/doc.php?id=7009306
The solution given there was to switch from a static to an adaptive sync
rate, especially when using very fast (gigabit) network interface cards.
For me, it seemed to work when switching from "rate 40M" to:
syncer {
c-plan-ahead 20;
c-min-rate 1M;
c-max-rate 300M;
c-fill-target 2M;
verify-alg md5;
}
Before that I also tried to use a plain crossover interface instead of the
bonding one but that had no effect. Adjusting the other values recommended
in the KB article did work for me, too, but I changed them back to their
defaults to isolate the above mentioned as the real fix.
Any comments on this one? Is this a bug in DRBD?
Best regards
-christian-
On Mon, 6 Jan 2014 12:49:31 +0100
Christian Hammers <chammers at netcologne.de> wrote:
> Hello
>
> DRBD stalls reproducibly whenever I do a "drbdadm verify". It runs
> a couple of minutes and then suddenly seems to loose its connection:
>
> Jan 6 10:52:54 vm-office03 kernel: block drbd0: conn( Connected -> VerifyS )
> Jan 6 10:52:54 vm-office03 kernel: block drbd0: Starting Online Verify from sector 813355536
> Jan 6 10:52:54 vm-office03 kernel: block drbd0: Out of sync: start=813367040, size=1280 (sectors)
> Jan 6 10:52:56 vm-office03 kernel: block drbd0: Out of sync: start=813526920, size=376 (sectors)
> Jan 6 10:52:57 vm-office03 kernel: block drbd0: Out of sync: start=813582592, size=472 (sectors)
> ...
> Jan 6 10:58:00 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967295
> Jan 6 10:58:06 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967294
> Jan 6 10:58:12 vm-office03 pvestatd[14442]: WARNING: command 'df -P -B 1 /mnt/pve/nfs-store1' failed: got timeout
> Jan 6 10:58:12 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967293
> Jan 6 10:58:18 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967292
> Jan 6 10:58:24 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967291
> Jan 6 10:58:30 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967290
> Jan 6 10:58:36 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967289
> ...
>
> (I noticed that the ko value is 2^32-1, what does it mean?)
>
> Any ideas what might cause this? There are no further kernel messages nor
> can I see any interface errors on the two "e1000e" physical interfaces
> that are bonded together for the 10.111.222.0/24 net.
>
> I successfully run "drbdadm invalidate" to re-sync the cluster several
> times and never experienced problems during normal usage (with very light
> disk I/O though).
>
> My setup consists of two Fujitsu Siemens RX200 S4 server which run DRBD
> in primary/primary mode for a "Proxmox VE" cluster. Two servers are
> connected via cross-cable and only one actually mounts the drbd device
> and exports it as NFS volume. All access to the device is thus made via
> NFS by the current NFS master node.
>
> The nodes are running Kernel 2.6.32-26-pve with DRBD 8.3.13.
>
> The DRBD config is as follows:
>
> root at vm-office03:/home/chammers# egrep -v '^[[:space:]]*#' /etc/drbd.d/drbd0.res
> resource drbd0 {
> protocol C;
>
> startup {
> become-primary-on both;
> }
>
> on vm-office03 {
> device /dev/drbd0;
> disk /dev/sda3;
> address 10.111.222.1:7789;
> meta-disk internal;
> }
>
> on vm-office04 {
> device /dev/drbd0;
> disk /dev/sda3;
> address 10.111.222.2:7789;
> meta-disk internal;
> }
>
> disk {
> no-disk-barrier;
> }
>
> net {
> cram-hmac-alg sha1;
> shared-secret "drbd0proxmox";
> data-integrity-alg crc32c;
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>
> }
>
> syncer {
> rate 40M;
> csums-alg crc32c;
> verify-alg md5;
> }
> }
>
> root at vm-office04:/home/chammers# grep -v '^[[:space:]]*#' /etc/drbd.d/drbd0.res
> resource drbd0 {
> protocol C;
>
> startup {
> become-primary-on both;
> }
>
> on vm-office03 {
> device /dev/drbd0;
> disk /dev/sda3;
> address 10.111.222.1:7789;
> meta-disk internal;
> }
>
> on vm-office04 {
> device /dev/drbd0;
> disk /dev/sda3;
> address 10.111.222.2:7789;
> meta-disk internal;
> }
>
> disk {
> no-disk-barrier;
> }
>
> net {
> cram-hmac-alg sha1;
> shared-secret "drbd0proxmox";
> data-integrity-alg crc32c;
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>
> }
>
> syncer {
> rate 40M;
> csums-alg crc32c;
> verify-alg md5;
> }
>
> Best Regards
>
> -christian-