Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Am 04.07.14 14:27, schrieb Lars Ellenberg: > On Mon, Jun 30, 2014 at 04:25:01PM +0200, Andreas Pflug wrote: >> I'm running a pair of Debian wheezy machines A and B with Xen 4.1 and a >> wheezy-backport kernel 3.14.7 to have drbd 8.4.3. >> Both machines are interconnected with a dedicated 1GB Intel 82574L >> (e1000e) drbd link, mtu=9216, storage is a fast battery-backed caching >> RAID controller each. >> >> There are 3 drbd devices active: >> drbd7 UpToDate/UpToDate and being filled by a vm using rsync (steady >> data rate of about 5.7MB/s) >> drbd12 UpToDate/Incosistent being resynced after initial creation, about >> 50MB/s >> drbd22 UpToDate/UpToDate being filled with arbitrary data using scp, >> some MB/s. >> >> Resync is configured c-fill-target 512k; c-min-rate 2M; c-max-rate 50M >> >> I've observed several crashes of the machine that has drbd7 and drbd22 >> primary, whether I use machine A or B for that. The kernel log of the >> faulting maching shows zero logging, just freshly rebooting, the other >> that stays up doesn't have any hint in kern.log either. Everthing is >> fine until "PingAck not arrived" and all connections go down. >> >> How can I find out what's leading to the crash? > Attach a serial console, > log to it, > capture the output. Thanks for the hint. Did that, and the kernel console spits out several weird messages from the e1000e driver, resulting in a kernel panic. Doing some iperf tests, I see horrible dropped RX packet counts when the mtu is set to 9216, whil everything looks fine with mtu=1500. Luckily, there's also an unused I350 port in the machine, which behaves flawlessly with big packets so I switched the drbd connection. Conclusion: Don't use Intel 82574L with MTU != 1500. Regards, Andreas