[DRBD-user] recovering from "Local IO failed. Detaching..."

Lars Ellenberg lars.ellenberg at linbit.com
Thu Sep 10 18:20:19 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Sep 10, 2009 at 05:44:42PM +0200, Gianluca Cecchi wrote:
> tried the "drbdadm attach r0" command but I get:
> 
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: disk( Diskless -> Attaching
> )
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: Found 6 transactions (244
> active extents) in activity log.
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: Method to ensure write
> ordering: barrier
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: max_segment_size ( = BIO
> size ) = 32768
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: recounting of set bits took
> additional 2 jiffies
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: 0 KB (0 bits) marked
> out-of-sync by on disk bit-map.
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: Marked additional 920 MB as
> out-of-sync based on AL.
> Sep 10 17:38:45 virtfedbis kernel: end_request: I/O error, dev cciss/c0d0,
> sector 0

this is interessting.
"should not happen",
and happens below drbd.
so something actually is not as it should be at the cciss level,
or at some other involved hardware level.

> Sep 10 17:38:45 virtfedbis kernel: block drbd0: meta data flush failed with
> status -95, disabling md-flushes
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: disk( Attaching ->
> Negotiating )
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: drbd_sync_handshake:
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: self
> FFEDAA5E725D8157:13925DF660B57F5D:0DB564243F5AA9A3:377245292BBD1112
> bits:235520 flags:0
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: peer
> A0332E51B243BEE1:FFEDAA5E725D8157:13925DF660B57F5D:0DB564243F5AA9A3
> bits:105320 flags:0
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: uuid_compare()=-1 by rule 50
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: conn( Connected -> WFBitMapT
> ) disk( Negotiating -> Outdated )
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: conn( WFBitMapT ->
> WFSyncUUID )
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: helper command:
> /sbin/drbdadm before-resync-target minor-0
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: helper command:
> /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: conn( WFSyncUUID ->
> SyncTarget ) disk( Outdated -> Inconsistent )
> Sep 10 17:38:45 virtfedbis kernel: block drbd0: Began resync as SyncTarget
> (will sync 1363360 KB [340840 bits set]).
> Sep 10 17:38:48 virtfedbis kernel: block drbd0: Resync aborted.
> Sep 10 17:38:48 virtfedbis kernel: block drbd0: conn( SyncTarget ->
> Connected ) disk( Inconsistent -> Failed )
> Sep 10 17:38:48 virtfedbis kernel: block drbd0: Local IO failed.
> Detaching...

detaches "without visible reason"...

I've seen similar symptoms before, and it could be worked around by
disabling offloading settings on the NICs used for the replication ;)
I know, that interaction sounds a bit far-fetched, but those are the
facts.

# to view offload settings
ethtool -k eth7
# to switch them all off:
ethtool -K eth7 rx off tx off sg off tso off


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list