[DRBD-user] recovering from "Local IO failed. Detaching..."

Gianluca Cecchi gianluca.cecchi at gmail.com
Thu Sep 10 15:58:04 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Fedora 11 x86_64 with kernel 2.6.30.5-43.fc11.x86_64 and drbd-8.3.3rc1
compiled from source with make rpm
so that I have now
[root at virtfedbis ]# rpm -qa drbd*
drbd-8.3.3rc1-3.x86_64
drbd-km-2.6.30.5_43.fc11.x86_64-8.3.3rc1-3.x86_64

The configuration is Primary/Primary

I get this message on one node
Sep  8 17:32:34 virtfedbis kernel: block drbd0: disk( UpToDate -> Failed )
Sep  8 17:32:34 virtfedbis kernel: block drbd0: Local IO failed.
Detaching...
Sep  8 17:32:34 virtfedbis kernel: block drbd0: disk( Failed -> Diskless )
Sep  8 17:32:34 virtfedbis kernel: block drbd0: Notified peer that my disk
is broken.

Now "service drdbd status" command on this node gives:
drbd driver loaded OK; device status:
version: 8.3.3rc1 (api:88/proto:86-91)
GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by
root at virtfedbis.ceda.polimi.it, 2009-09-08 16:21:30
m:res  cs         ro               ds                 p  mounted  fstype
0:r0   Connected  Primary/Primary  Diskless/UpToDate  C

2 problems:

a) it seems I actually didn't get any I/O error in messages, apart from this
drbd one.....
how to check if actually I had an I/O error?

b) how are the proper commands to recover or at least try to recover,
supposing the disk is ok?

The disk is an hw raid on a Hp blade and I don't get any hw error indeed
also from information provided by iLO....
Does drdbd support some kind of queuing via drbd.conf, or does it inherit
queuing from scsi layer or what else?

Only messages I get before this event are some minutes before when peer drbd
daemon started and so sync happened:

Sep  8 17:29:35 virtfedbis kernel: block drbd0: Handshake successful: Agreed
network protocol version 91
Sep  8 17:29:35 virtfedbis kernel: block drbd0: Peer authenticated using 20
bytes of 'sha1' HMAC
Sep  8 17:29:35 virtfedbis kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Sep  8 17:29:35 virtfedbis kernel: block drbd0: Starting asender thread
(from drbd0_receiver [9977])
Sep  8 17:29:35 virtfedbis kernel: block drbd0: data-integrity-alg:
<not-used>
Sep  8 17:29:35 virtfedbis kernel: block drbd0: drbd_sync_handshake:
Sep  8 17:29:35 virtfedbis kernel: block drbd0: self
FFEDAA5E725D8157:0DB564243F5AA9A3:377245292BBD1112:F6DD5DF112448173 bits:0
flags:0
Sep  8 17:29:35 virtfedbis kernel: block drbd0: peer
0DB564243F5AA9A2:0000000000000000:377245292BBD1113:F6DD5DF112448173 bits:0
flags:0
Sep  8 17:29:35 virtfedbis kernel: block drbd0: uuid_compare()=1 by rule 70
Sep  8 17:29:35 virtfedbis kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS )
Sep  8 17:29:35 virtfedbis kernel: block drbd0: peer( Secondary -> Primary )

Sep  8 17:29:35 virtfedbis kernel: block drbd0: conn( WFBitMapS ->
SyncSource ) pdsk( Outdated -> Inconsistent )
Sep  8 17:29:35 virtfedbis kernel: block drbd0: Began resync as SyncSource
(will sync 0 KB [0 bits set]).
Sep  8 17:29:35 virtfedbis kernel: block drbd0: Resync done (total 1 sec;
paused 0 sec; 0 K/sec)
Sep  8 17:29:35 virtfedbis kernel: block drbd0: conn( SyncSource ->
Connected ) pdsk( Inconsistent -> UpToDate )
Sep  8 17:29:40 virtfedbis kernel: block drbd0: md_sync_timer expired!
Worker calls drbd_md_sync().

similar output from dmesg command gives as latest rows:

block drbd0: drbd_sync_handshake:
block drbd0: self
FFEDAA5E725D8157:0DB564243F5AA9A3:377245292BBD1112:F6DD5DF112448173 bits:0
flags:0
block drbd0: peer
0DB564243F5AA9A2:0000000000000000:377245292BBD1113:F6DD5DF112448173 bits:0
flags:0
block drbd0: uuid_compare()=1 by rule 70
block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS
)
block drbd0: peer( Secondary -> Primary )
block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent
)
block drbd0: Began resync as SyncSource (will sync 0 KB [0 bits set]).
block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate
)
dlm: connecting to 1
block drbd0: md_sync_timer expired! Worker calls drbd_md_sync().
block drbd0: disk( UpToDate -> Failed )
block drbd0: Local IO failed. Detaching...
block drbd0: disk( Failed -> Diskless )
block drbd0: Notified peer that my disk is broken.

Thanks,
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090910/1c3360eb/attachment.htm>


More information about the drbd-user mailing list