[DRBD-user] Local IO failed. Detaching...

Lars Ellenberg lars.ellenberg at linbit.com
Fri Jan 30 11:11:50 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Jan 30, 2009 at 11:00:02AM +0100, Max Serafini wrote:
> Hi,
> 
> I running openfiler with fibre disks.  The discs on fibre show diskless on
> drbd status:
> 
> 0:cluster_metadata  Connected  Secondary/Secondary  UpToDate/UpToDate C
> 1:vg0drbd           Connected  Secondary/Secondary  Diskless/Inconsistent C
> 2:vg1drbd           Connected  Secondary/Secondary  Diskless/Inconsistent C
> 3:vg2drbd           Connected  Secondary/Secondary  Diskless/Inconsistent C
> 4:vg3drbd           Connected  Secondary/Secondary  Diskless/Inconsistent C
> 
>  
> This is the output of 'drbdadm attach vg1drbd' in /var/log/messages.  The
> others are exactly the same.
> 
> Jan 27 14:01:13 filer1 kernel: drbd2: disk( Diskless -> Attaching )
> Jan 27 14:01:13 filer1 kernel: drbd2: No usable activity log found.
> Jan 27 14:01:13 filer1 kernel: drbd2: Method to ensure write ordering: barrier
> Jan 27 14:01:13 filer1 kernel: drbd2: max_segment_size ( = BIO size ) = 32768
> Jan 27 14:01:13 filer1 kernel: drbd2: recounting of set bits took additional 39 jiffies
> Jan 27 14:01:13 filer1 kernel: drbd2: 2048 GB (536851801 bits) marked out-of-sync by on disk bit-map.
> Jan 27 14:01:13 filer1 kernel: drbd2: disk( Attaching -> Inconsistent )
> Jan 27 14:01:13 filer1 kernel: end_request: I/O error, dev sdc, sector 4294945607

well, it _does_ give back an I/O error.
nothing DRBD can do about that.

> Jan 27 14:01:13 filer1 kernel: drbd2: drbd_md_sync_page_io(,4294945544s,WRITE) failed!
> Jan 27 14:01:13 filer1 kernel: drbd2: meta data update failed!
> Jan 27 14:01:13 filer1 kernel: drbd2: disk( Inconsistent -> Failed )
> Jan 27 14:01:13 filer1 kernel: drbd2: Local IO failed. Detaching...
> Jan 27 14:01:13 filer1 kernel: drbd2: disk( Failed -> Diskless )
>  
> 
> I have tested the fibre disks with other systems and they are fully
> functional.  I have tested the same setup with other disks and it works
> fine.

did you try to write to the sector 4294945607 of sdc,
which is the sector 4294945544 of sdc1,
which is the one that failed?

# read from there
dd if=/dev/sdc bs=512 skip=4294945607 count=1 iflag=direct of=/dev/null

# write to there (DESTRUCTIV, just in case you have any data there already)
dd of=/dev/sdc bs=512 seek=4294945607 count=1 oflag=direct if=/dev/zero

you may want to vary the skip/seek sector numbers over a range of +-64,
just to check where the io errors start.

may be an off by one implementation bug in your openfiler fibre disks?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list