Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Dec 31, 2009 at 11:37:05AM +0100, Alex Rutgers wrote: > Problem description: > > > > I'm testing drbd in combination w/ heartbeat, it's a fresh new > installation, I also re-installed, drdb, recreated the file systems, > > verified & efsck'd. I'm still seeing these (none-fatal, I have set > passing on) i/o read errors when the drbd device is made primary, > secondary and primary again, the file system got mounted and then I do a > simple "chown -R mysql:mysql /drbd0/mysql" > > I' getting these errors in /var/log/messages: > > > > Dec 31 10:44:04 ndb1-test kernel: block drbd0: role( Primary -> Secondary ) > Dec 31 10:44:04 ndb1-test kernel: block drbd0: role( Secondary -> Primary ) > Dec 31 10:44:04 ndb1-test kernel: kjournald starting. Commit interval 5 seconds > Dec 31 10:44:04 ndb1-test kernel: EXT3 FS on drbd0, internal journal > Dec 31 10:44:04 ndb1-test kernel: EXT3-fs: mounted filesystem with ordered data mode. > Dec 31 10:44:04 ndb1-test kernel: block drbd0: p read: error=-11 > Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local READ failed sec=11238600s size=4096 > Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local IO failed in __req_mod.Passing error on... > Dec 31 10:44:04 ndb1-test kernel: block drbd0: p read: error=-11 > Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local READ failed sec=11933784s size=4096 > Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local IO failed in __req_mod.Passing error on... > Dec 31 10:44:04 ndb1-test kernel: block drbd0: p read: error=-11 > Dec 31 10:44:04 ndb1-test kernel: block drbd0: Local READ failed sec=12363408s size=4096 > > > My local device's /sda, sdb->lvm are NOT failing (all other stuff works, > no scsi raid errors whatsoever, fsck says all is okay). > > I isolated this reproduction scenario: (using heartbeat scripts) (log > above is created with these statements) > > > > /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd0 ext3 stop > /etc/ha.d/resource.d/drbddisk r0 stop > /etc/ha.d/resource.d/drbddisk r0 start > /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd0 ext3 start > chown -R mysql:mysql /drbd0/mysql > > The -11 error suggest that drbd is getting the OS error code 11: > Resource temporarily unavailable, and might be ignored in this case, > despite the error's, the ownership is changed to user/grp mysql.mysql > recursive in the directory. > > I think this might be a defect? It would be a shame of the i/o failure > detection capability to turn it off (pass-on) as a workaround. > > When I do not pass on, drbd declares my server Diskless which is not a > realistic scenario when running mysql under heavy load. > > A workaround for MySQL is to remove the chown statement from the start > script, however that's also adding a risk as the statement was put there > with a purpose J This looks like heavy read ahead IO on file system meta data (directories), in this case caused by recursive chown, failed with -EWOULDBLOCK (resp. EAGAIN, which is the same in linux). But we _do_ special case errors on READA requests, so these should be silently "passed on". Of cause we could also special case the "EAGAIN" error on READ (!= READA) and WRITE requests, but the bug is elsewhere. A driver is not supposed to return EAGAIN for READ or WRITE. > Context: > > I'm using the conf below on a somewhat older version of RH Linux > ndb1-test.momac.net 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 uhum. please consider upgrading. iirc, latest RHEL 4 kernel is -89.something. It may even help! > i686 i686 i386 GNU/Linux, Red Hat Enterprise Linux ES release 4 (Nahant > Update 2). I'm using lvm2 for my devices. > > > DRBD Version: 8.3.4 (api:88) (created version from source). > > DRBDADM_BUILDTAG=GIT-hash:\ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\ build\ > by\ root at ndb1-test.momac.net\,\ 2009-12-29\ 13:44:59 did you X out the git hash? is this not the tagged release, but some intermediate version? and btw, try to avoid line wraps and additional newlines. I may refuse to look twice at logs or config files or other things that hurt my eyes ;) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed