[DRBD-user] problem - DRBD Version: 8.3.4 - block drbd0: p read: error=-11 - block drbd0: Local READ failed - while doing a chown -R mysql:mysql

Lars Ellenberg lars.ellenberg at linbit.com
Mon Jan 4 14:21:59 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Dec 31, 2009 at 11:37:05AM +0100, Alex Rutgers wrote:
> Problem description:
> 
>  
> 
> I'm testing drbd in combination w/ heartbeat, it's a fresh new
> installation, I also re-installed, drdb, recreated the file systems, 
> 
> verified &  efsck'd. I'm still seeing these (none-fatal, I have set
> passing on) i/o read errors when the drbd device is made primary,
> secondary and primary again, the file system got mounted and then I do a
> simple "chown -R mysql:mysql /drbd0/mysql"
> 
> I' getting these errors in /var/log/messages: 
> 
>  
> 

> Dec 31 10:44:04 ndb1-test kernel: block drbd0: role( Primary -> Secondary )
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: role( Secondary -> Primary )
> Dec 31 10:44:04 ndb1-test kernel: kjournald starting.  Commit interval 5 seconds
> Dec 31 10:44:04 ndb1-test kernel: EXT3 FS on drbd0, internal journal 
> Dec 31 10:44:04 ndb1-test kernel: EXT3-fs: mounted filesystem with ordered data mode.
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: p read: error=-11 
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local READ failed sec=11238600s size=4096
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local IO failed in __req_mod.Passing error on...
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: p read: error=-11 
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local READ failed sec=11933784s size=4096
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local IO failed in __req_mod.Passing error on...
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: p read: error=-11 
> Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local READ failed sec=12363408s size=4096
>  
> 
> My local device's /sda, sdb->lvm are NOT failing (all other stuff works,
> no scsi raid errors whatsoever, fsck says all is okay).
> 
> I isolated this reproduction scenario: (using heartbeat scripts) (log
> above is created with these statements)
> 
>  
> 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd0 ext3 stop
> /etc/ha.d/resource.d/drbddisk r0 stop
> /etc/ha.d/resource.d/drbddisk r0 start
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd0 ext3 start
> chown -R mysql:mysql /drbd0/mysql
> 
> The -11 error suggest that drbd is getting the OS error code  11:
> Resource temporarily unavailable, and might be ignored in this case,
> despite the error's, the ownership is changed to user/grp mysql.mysql
> recursive in the directory. 
> 
> I think this might be a defect? It would be a shame of the i/o failure
> detection capability to turn it off (pass-on) as a workaround. 
> 
> When I do not pass on, drbd declares my server Diskless which is not a
> realistic scenario when running mysql under heavy load.
> 
> A workaround for MySQL is to remove the chown statement from the start
> script, however that's also adding a risk as the statement was put there
> with a purpose J


This looks like heavy read ahead IO on file system meta data
(directories), in this case caused by recursive chown, failed with
-EWOULDBLOCK (resp. EAGAIN, which is the same in linux).

But we _do_ special case errors on READA requests,
so these should be silently "passed on".

Of cause we could also special case the "EAGAIN" error
on READ (!= READA) and WRITE requests, but the bug is elsewhere.
A driver is not supposed to return EAGAIN for READ or WRITE.

> Context:
> 
> I'm using the conf below on a somewhat older version of RH Linux
> ndb1-test.momac.net 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005

uhum.
please consider upgrading.
iirc, latest RHEL 4 kernel is -89.something.

It may even help!

> i686 i686 i386 GNU/Linux, Red Hat Enterprise Linux ES release 4 (Nahant
> Update 2). I'm using lvm2 for my devices.
>  
> 
> DRBD Version: 8.3.4 (api:88) (created version from source).
> 
> DRBDADM_BUILDTAG=GIT-hash:\ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\ build\
> by\ root at ndb1-test.momac.net\,\ 2009-12-29\ 13:44:59

did you X out the git hash?
is this not the tagged release, but some intermediate version?

and btw, try to avoid line wraps and additional newlines.  I may refuse
to look twice at logs or config files or other things that hurt my eyes
 ;)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list