Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Jan 28, 2013 at 09:31:31AM +0000, Ben Clewett wrote: > Hi guys, > > We have a failure which hits us every few weeks on just one server. > We suspect hardware issue on the network card. But it's proving > hard to tie down. This is the failure and I would be interested in > the opinion of this group. > > Error: > > [1580483.649257] block drbd0: magic?? on data m: 0x0 c: 0 l: 0 Each DRBD network packet starts with a DRBD specific header. That header contains a "magic" number, a "command" id, and a payload "length". All three of them are apparently zeroed out. So yes, that pretty much looks like your network path somehow managed to zero out at least the start of a packet. The asserts below are "boring", and the code has since been fixed to no longer trigger those. > [1580483.649269] block drbd0: ASSERT FAILED cstate = Connected, > expected < WFConnection > [1580483.649286] block drbd0: ASSERT( mdev->state.conn < C_CONNECTED > ) in > /usr/src/packages/BUILD/drbd-8.3.4/obj/default/drbd_receiver.c:4500 > [1580483.649295] block drbd0: asender terminated > [1580483.649301] block drbd0: Terminating asender thread > [1580483.649384] block drbd0: Connection closed > [1580483.649390] block drbd0: peer( Primary -> Unknown ) conn( > Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) > [1580483.649396] block drbd0: receiver terminated > [1580483.649399] block drbd0: Terminating receiver thread > > /proc/drbd > version: 8.3.4 (api:88/proto:86-91) I recommend to upgrade to 8.3.15, enable "data integrity checksumming", run an online-verify, and see where that gets you. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed