Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 22/05/12 13:05, Florian Haas wrote: >> Indeed, I started logging this command every second, and e.g. this kind >> of event is typical every few hours: >> >> dd if=/dev/zero of=/dev/drbd13 conv=fdatasync bs=1M count=1 2>&1 | \ >> grep copied >> >> 2012-05-22 02:17:00 W 1048576 bytes (1.0 MB) copied, 0.0115253 s, 91.0 MB/s >> 2012-05-22 02:17:01 W 1048576 bytes (1.0 MB) copied, 0.011519 s, 91.0 MB/s >> 2012-05-22 02:17:02 W 1048576 bytes (1.0 MB) copied, 0.0116563 s, 90.0 MB/s >> 2012-05-22 02:17:03 W 1048576 bytes (1.0 MB) copied, 1.1898 s, 881 kB/s >> 2012-05-22 02:17:05 W 1048576 bytes (1.0 MB) copied, 28.3202 s, 37.0 kB/s >> 2012-05-22 02:17:35 W 1048576 bytes (1.0 MB) copied, 0.0127468 s, 82.3 MB/s >> 2012-05-22 02:17:36 W 1048576 bytes (1.0 MB) copied, 0.0113499 s, 92.4 MB/s >> 2012-05-22 02:17:37 W 1048576 bytes (1.0 MB) copied, 0.0112707 s, 93.0 MB/s >> >> And in the kernel log: >> >> May 22 02:17:11 v3a kernel: [1341064.126449] block drbd13: >> [drbd13_worker/797] sock_sendmsg time expired, ko = 4294967295 >> May 22 02:17:17 v3a kernel: [1341070.129829] block drbd13: >> [drbd13_worker/797] sock_sendmsg time expired, ko = 4294967294 >> May 22 02:17:23 v3a kernel: [1341076.133170] block drbd13: >> [drbd13_worker/797] sock_sendmsg time expired, ko = 4294967293 >> May 22 02:17:29 v3a kernel: [1341082.133592] block drbd13: >> [drbd13_worker/797] sock_sendmsg time expired, ko = 4294967292 >> >> Curiously, the "v3a" host (on which I'm running this test) just shows >> these disconnects, it's the "v3b" host that gives the "PingAck not >> received" messages. But not in this instance. > > And you're absolutely certain that "ip -s link show dev <nic>" shows > no drops or errors (for <nic> replaced with your DRBD replication > device name)? Also, if you're replicating through a switch as opposed > to via a back-to-back connection, did you check the switch port > statistics for errors as well? I hadn't spotted any but I've asked for a third opinion, because I can't see any other explanation. I've also set up a continuous TCP connection on the same interfaces between a pair of scripts that will report if there is any break in TCP connectivity, as it appears drbd is seeing. > I will observe that up to this point you haven't shared your DRBD > config, so if you pastebinned that and shared the URL here, people > could look into it and point to possible issues. I'm not using drbdadm and the helper, my "pairvm" script manages DRBD for VMs using this command to attach the disc: drbd.setup("disk", drbd_backing_device, drbd_meta_device, 0) and these commands to connect the network, pretty unambitious stuff I assume: drbd.setup("net", *(drbd_net_args + extra_options)) drbd.setup("syncer", "-r", "20M") def drbd_net_args [ "#{Global.ips['here_drbd']}:#{drbd_port}", "#{Global.ips['there_drbd']}:#{drbd_port}", "B", "--after-sb-0pri", "discard-zero-changes", "--after-sb-1pri", "consensus", "--after-sb-2pri", "disconnect", "--ping-timeout", "50" ] end NB these just translate to "drbdsetup /dev/drbdX net ..." with the IPs and ports automatically assigned by the script. -- Matthew -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120522/25c73b63/attachment.pgp>