[DRBD-user] magic failure
lars.ellenberg at linbit.com
Mon Jan 28 11:41:09 CET 2013
On Mon, Jan 28, 2013 at 09:31:31AM +0000, Ben Clewett wrote:
> Hi guys,
> We have a failure which hits us every few weeks on just one server.
> We suspect hardware issue on the network card. But it's proving
> hard to tie down. This is the failure and I would be interested in
> the opinion of this group.
> [1580483.649257] block drbd0: magic?? on data m: 0x0 c: 0 l: 0
Each DRBD network packet starts with a DRBD specific header.
That header contains a "magic" number, a "command" id,
and a payload "length".
All three of them are apparently zeroed out.
So yes, that pretty much looks like your network path
somehow managed to zero out at least the start of a packet.
The asserts below are "boring", and the code has since been fixed to no
longer trigger those.
> [1580483.649269] block drbd0: ASSERT FAILED cstate = Connected,
> expected < WFConnection
> [1580483.649286] block drbd0: ASSERT( mdev->state.conn < C_CONNECTED
> ) in
> [1580483.649295] block drbd0: asender terminated
> [1580483.649301] block drbd0: Terminating asender thread
> [1580483.649384] block drbd0: Connection closed
> [1580483.649390] block drbd0: peer( Primary -> Unknown ) conn(
> Connected -> Unconnected ) pdsk( UpToDate -> DUnknown )
> [1580483.649396] block drbd0: receiver terminated
> [1580483.649399] block drbd0: Terminating receiver thread
> version: 8.3.4 (api:88/proto:86-91)
I recommend to upgrade to 8.3.15,
enable "data integrity checksumming",
run an online-verify,
and see where that gets you.
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
please don't Cc me, but send to list -- I'm subscribed
More information about the drbd-user