Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Jan 04, 2008 at 06:07:52PM -0300, Ítalo Rossi wrote: > Hello all, > > I have two servers with drbd0.7.24 with LVM and XFS filesystem. > > My Storages: > > > Server 1( Red Hat 4) Server 2( > Debian Etch) > | | > | | > DRBD - - - - - - > - - - - - - - - - - > DRBD > | | > | | > LVM2 (3.2T) > RAID0 (3.2T) --> MD1000 (1 HBA) > | > | > | > / \ > / \ __ > / RAID0(1.6T) | > | | -> MD3000(2 HBAs) > RAID0(1.6T) __| > > So, if the Server 1 become primary I have this messages on dmesg (on- > io-error = "pass_on"): > > mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code=(0x17), > SubCode(0x0000) > mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code=(0x17), > SubCode(0x0000) > mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code=(0x17), > SubCode(0x0000) > SCSI error : <0 0 21 1> return code = 0x20008 > end_request: I/O error, dev sdb, sector 2942884192 > drbd1: Ignoring local IO error! ... this is a real scsi error, i.e. the MD3000 hardware reports an IO error to the kernel. google for mptbase SCSI error end_request I/O error > I need to umount the drbd, run xfs_repair and remount, this solves my > problem but in 1 week it brokes again.. > > So, trying to solve this issue, I set the Server 2 in primary state ad > Server 1 secondary and I'm still getting this errors on Server 1 > dmesg, but my application still running without problems: ... > Is this a DRBD, LVM2 or MD3000 (modules or cable) issue? 5min of googling suggests that this might have been work-arounded in the kernel driver of more recent kernels (kernel.org 2.6.19 and later, potentially backported into various, but not all, "stable" (recent four digits kernel releases), very difficult to tell for vendor kernel versions within 5minutes), by throttling the transmission rate or something. aparently there is also some bios setting in the MD3000 to reduce transfer rate there, to avoid the problem with older kernels. redhat bugzillas for similar looking issues for redheat 3 and 4 and fedora core (various versions) exist. (add bugzilla.redhat.com to the above google keywords). I suggest you contact vendor support for recommendations. one DRBD related note, still: I'm not exactly sure from the top of my head how "on-io-error pass_on" behaves in 0.7. we use this setting not too often. but in any case the corresponding sectors will be out-of-sync now, you need to at least disconnect/reconnect the drbd pair, to have them be resynced, otherwise on the next failover, suprise, some sectors (those where drbd was not able to write the data, but ignored the io error because you configured it to do so) will contain "unexpected data". -- : Lars Ellenberg Tel +43-1-8178292-55 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :