[DRBD-user] "PingAck not received" messages

Florian Haas florian at hastexo.com
Tue May 22 14:05:40 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, May 22, 2012 at 12:45 PM, Matthew Bloch <matthew at bytemark.co.uk> wrote:
> On 22/05/12 08:16, Felix Frank wrote:
>> On 05/21/2012 06:56 PM, Matthew Bloch wrote:
>>> Thanks for the accounts Pascal and Felix, though Felix I'm pretty
>>> certain Debian/lenny's kernel had a virtio bug that does cause its
>>> network to break and require a "rmmod virtio_net; modprobe virtio_net"
>>> to fix.  That's nothing to do with drbd, and your problem may be
>>> entirely separate from that as well :)
>>
>> Right - I agree that guest issues could not conceivably cause DRBD
>> issues on the host. I had wrongly inferred that you were DRBDing from
>> inside a guest.

So was I, originally.

> Indeed, I started logging this command every second, and e.g. this kind
> of event is typical every few hours:
>
>  dd if=/dev/zero of=/dev/drbd13 conv=fdatasync bs=1M count=1 2>&1 | \
>    grep copied
>
> 2012-05-22 02:17:00 W 1048576 bytes (1.0 MB) copied, 0.0115253 s, 91.0 MB/s
> 2012-05-22 02:17:01 W 1048576 bytes (1.0 MB) copied, 0.011519 s, 91.0 MB/s
> 2012-05-22 02:17:02 W 1048576 bytes (1.0 MB) copied, 0.0116563 s, 90.0 MB/s
> 2012-05-22 02:17:03 W 1048576 bytes (1.0 MB) copied, 1.1898 s, 881 kB/s
> 2012-05-22 02:17:05 W 1048576 bytes (1.0 MB) copied, 28.3202 s, 37.0 kB/s
> 2012-05-22 02:17:35 W 1048576 bytes (1.0 MB) copied, 0.0127468 s, 82.3 MB/s
> 2012-05-22 02:17:36 W 1048576 bytes (1.0 MB) copied, 0.0113499 s, 92.4 MB/s
> 2012-05-22 02:17:37 W 1048576 bytes (1.0 MB) copied, 0.0112707 s, 93.0 MB/s
>
> And in the kernel log:
>
> May 22 02:17:11 v3a kernel: [1341064.126449] block drbd13:
> [drbd13_worker/797] sock_sendmsg time expired, ko = 4294967295
> May 22 02:17:17 v3a kernel: [1341070.129829] block drbd13:
> [drbd13_worker/797] sock_sendmsg time expired, ko = 4294967294
> May 22 02:17:23 v3a kernel: [1341076.133170] block drbd13:
> [drbd13_worker/797] sock_sendmsg time expired, ko = 4294967293
> May 22 02:17:29 v3a kernel: [1341082.133592] block drbd13:
> [drbd13_worker/797] sock_sendmsg time expired, ko = 4294967292
>
> Curiously, the "v3a" host (on which I'm running this test) just shows
> these disconnects, it's the "v3b" host that gives the "PingAck not
> received" messages.  But not in this instance.

And you're absolutely certain that "ip -s link show dev <nic>" shows
no drops or errors (for <nic> replaced with your DRBD replication
device name)? Also, if you're replicating through a switch as opposed
to via a back-to-back connection, did you check the switch port
statistics for errors as well?

I will observe that up to this point you haven't shared your DRBD
config, so if you pastebinned that and shared the URL here, people
could look into it and point to possible issues.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now



More information about the drbd-user mailing list