Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello I found the following knowledge base article from 2011 which described my exact problem: http://www.novell.com/support/kb/doc.php?id=7009306 The solution given there was to switch from a static to an adaptive sync rate, especially when using very fast (gigabit) network interface cards. For me, it seemed to work when switching from "rate 40M" to: syncer { c-plan-ahead 20; c-min-rate 1M; c-max-rate 300M; c-fill-target 2M; verify-alg md5; } Before that I also tried to use a plain crossover interface instead of the bonding one but that had no effect. Adjusting the other values recommended in the KB article did work for me, too, but I changed them back to their defaults to isolate the above mentioned as the real fix. Any comments on this one? Is this a bug in DRBD? Best regards -christian- On Mon, 6 Jan 2014 12:49:31 +0100 Christian Hammers <chammers at netcologne.de> wrote: > Hello > > DRBD stalls reproducibly whenever I do a "drbdadm verify". It runs > a couple of minutes and then suddenly seems to loose its connection: > > Jan 6 10:52:54 vm-office03 kernel: block drbd0: conn( Connected -> VerifyS ) > Jan 6 10:52:54 vm-office03 kernel: block drbd0: Starting Online Verify from sector 813355536 > Jan 6 10:52:54 vm-office03 kernel: block drbd0: Out of sync: start=813367040, size=1280 (sectors) > Jan 6 10:52:56 vm-office03 kernel: block drbd0: Out of sync: start=813526920, size=376 (sectors) > Jan 6 10:52:57 vm-office03 kernel: block drbd0: Out of sync: start=813582592, size=472 (sectors) > ... > Jan 6 10:58:00 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967295 > Jan 6 10:58:06 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967294 > Jan 6 10:58:12 vm-office03 pvestatd[14442]: WARNING: command 'df -P -B 1 /mnt/pve/nfs-store1' failed: got timeout > Jan 6 10:58:12 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967293 > Jan 6 10:58:18 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967292 > Jan 6 10:58:24 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967291 > Jan 6 10:58:30 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967290 > Jan 6 10:58:36 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967289 > ... > > (I noticed that the ko value is 2^32-1, what does it mean?) > > Any ideas what might cause this? There are no further kernel messages nor > can I see any interface errors on the two "e1000e" physical interfaces > that are bonded together for the 10.111.222.0/24 net. > > I successfully run "drbdadm invalidate" to re-sync the cluster several > times and never experienced problems during normal usage (with very light > disk I/O though). > > My setup consists of two Fujitsu Siemens RX200 S4 server which run DRBD > in primary/primary mode for a "Proxmox VE" cluster. Two servers are > connected via cross-cable and only one actually mounts the drbd device > and exports it as NFS volume. All access to the device is thus made via > NFS by the current NFS master node. > > The nodes are running Kernel 2.6.32-26-pve with DRBD 8.3.13. > > The DRBD config is as follows: > > root at vm-office03:/home/chammers# egrep -v '^[[:space:]]*#' /etc/drbd.d/drbd0.res > resource drbd0 { > protocol C; > > startup { > become-primary-on both; > } > > on vm-office03 { > device /dev/drbd0; > disk /dev/sda3; > address 10.111.222.1:7789; > meta-disk internal; > } > > on vm-office04 { > device /dev/drbd0; > disk /dev/sda3; > address 10.111.222.2:7789; > meta-disk internal; > } > > disk { > no-disk-barrier; > } > > net { > cram-hmac-alg sha1; > shared-secret "drbd0proxmox"; > data-integrity-alg crc32c; > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > > } > > syncer { > rate 40M; > csums-alg crc32c; > verify-alg md5; > } > } > > root at vm-office04:/home/chammers# grep -v '^[[:space:]]*#' /etc/drbd.d/drbd0.res > resource drbd0 { > protocol C; > > startup { > become-primary-on both; > } > > on vm-office03 { > device /dev/drbd0; > disk /dev/sda3; > address 10.111.222.1:7789; > meta-disk internal; > } > > on vm-office04 { > device /dev/drbd0; > disk /dev/sda3; > address 10.111.222.2:7789; > meta-disk internal; > } > > disk { > no-disk-barrier; > } > > net { > cram-hmac-alg sha1; > shared-secret "drbd0proxmox"; > data-integrity-alg crc32c; > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > > } > > syncer { > rate 40M; > csums-alg crc32c; > verify-alg md5; > } > > Best Regards > > -christian-