Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello DRBD stalls reproducibly whenever I do a "drbdadm verify". It runs a couple of minutes and then suddenly seems to loose its connection: Jan 6 10:52:54 vm-office03 kernel: block drbd0: conn( Connected -> VerifyS ) Jan 6 10:52:54 vm-office03 kernel: block drbd0: Starting Online Verify from sector 813355536 Jan 6 10:52:54 vm-office03 kernel: block drbd0: Out of sync: start=813367040, size=1280 (sectors) Jan 6 10:52:56 vm-office03 kernel: block drbd0: Out of sync: start=813526920, size=376 (sectors) Jan 6 10:52:57 vm-office03 kernel: block drbd0: Out of sync: start=813582592, size=472 (sectors) ... Jan 6 10:58:00 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967295 Jan 6 10:58:06 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967294 Jan 6 10:58:12 vm-office03 pvestatd[14442]: WARNING: command 'df -P -B 1 /mnt/pve/nfs-store1' failed: got timeout Jan 6 10:58:12 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967293 Jan 6 10:58:18 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967292 Jan 6 10:58:24 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967291 Jan 6 10:58:30 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967290 Jan 6 10:58:36 vm-office03 kernel: block drbd0: [drbd0_worker/3154] sock_sendmsg time expired, ko = 4294967289 ... (I noticed that the ko value is 2^32-1, what does it mean?) Any ideas what might cause this? There are no further kernel messages nor can I see any interface errors on the two "e1000e" physical interfaces that are bonded together for the 10.111.222.0/24 net. I successfully run "drbdadm invalidate" to re-sync the cluster several times and never experienced problems during normal usage (with very light disk I/O though). My setup consists of two Fujitsu Siemens RX200 S4 server which run DRBD in primary/primary mode for a "Proxmox VE" cluster. Two servers are connected via cross-cable and only one actually mounts the drbd device and exports it as NFS volume. All access to the device is thus made via NFS by the current NFS master node. The nodes are running Kernel 2.6.32-26-pve with DRBD 8.3.13. The DRBD config is as follows: root at vm-office03:/home/chammers# egrep -v '^[[:space:]]*#' /etc/drbd.d/drbd0.res resource drbd0 { protocol C; startup { become-primary-on both; } on vm-office03 { device /dev/drbd0; disk /dev/sda3; address 10.111.222.1:7789; meta-disk internal; } on vm-office04 { device /dev/drbd0; disk /dev/sda3; address 10.111.222.2:7789; meta-disk internal; } disk { no-disk-barrier; } net { cram-hmac-alg sha1; shared-secret "drbd0proxmox"; data-integrity-alg crc32c; allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } syncer { rate 40M; csums-alg crc32c; verify-alg md5; } } root at vm-office04:/home/chammers# grep -v '^[[:space:]]*#' /etc/drbd.d/drbd0.res resource drbd0 { protocol C; startup { become-primary-on both; } on vm-office03 { device /dev/drbd0; disk /dev/sda3; address 10.111.222.1:7789; meta-disk internal; } on vm-office04 { device /dev/drbd0; disk /dev/sda3; address 10.111.222.2:7789; meta-disk internal; } disk { no-disk-barrier; } net { cram-hmac-alg sha1; shared-secret "drbd0proxmox"; data-integrity-alg crc32c; allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } syncer { rate 40M; csums-alg crc32c; verify-alg md5; } Best Regards -christian-