[DRBD-user] magic failure

Lars Ellenberg lars.ellenberg at linbit.com
Mon Jan 28 11:41:09 CET 2013


On Mon, Jan 28, 2013 at 09:31:31AM +0000, Ben Clewett wrote:
> Hi guys,
> 
> We have a failure which hits us every few weeks on just one server.
> We suspect hardware issue on the network card.  But it's proving
> hard to tie down.   This is the failure and I would be interested in
> the opinion of this group.
> 
> Error:
> 
> [1580483.649257] block drbd0: magic?? on data m: 0x0 c: 0 l: 0


Each DRBD network packet starts with a DRBD specific header.
That header contains a "magic" number, a "command" id,
and a payload "length".

All three of them are apparently zeroed out.
So yes, that pretty much looks like your network path
somehow managed to zero out at least the start of a packet.


The asserts below are "boring", and the code has since been fixed to no
longer trigger those.

> [1580483.649269] block drbd0: ASSERT FAILED cstate = Connected,
> expected < WFConnection
> [1580483.649286] block drbd0: ASSERT( mdev->state.conn < C_CONNECTED
> ) in
> /usr/src/packages/BUILD/drbd-8.3.4/obj/default/drbd_receiver.c:4500
> [1580483.649295] block drbd0: asender terminated
> [1580483.649301] block drbd0: Terminating asender thread
> [1580483.649384] block drbd0: Connection closed
> [1580483.649390] block drbd0: peer( Primary -> Unknown ) conn(
> Connected -> Unconnected ) pdsk( UpToDate -> DUnknown )
> [1580483.649396] block drbd0: receiver terminated
> [1580483.649399] block drbd0: Terminating receiver thread
> 
> /proc/drbd
> version: 8.3.4 (api:88/proto:86-91)

I recommend to upgrade to 8.3.15,
enable "data integrity checksumming",
run an online-verify,
and see where that gets you.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list