Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
>Hi Maurice, > >If you're running into corruption both in ext3 metadata and in MySQL >data, it is certainly not he fault of MySQL as you're likely aware. I am hoping they are not related. The problems with MySQL surfaced almost immediately after upgrading to 5.0.x. >[details deleted] > >You can see that there are in fact many bits flipped in each. I >would suspect higher-level corruption than I initially thought this as well, but the explanation on the ext3 mailing list is that it really is just a lone flipped bit in both instances. The other differences are due to fsck padding out the block when it guesses what the correct size is. >Do note that data on e.g. the PCI bus is not protected by any sort >of checksum. I've seen this cause corruption problems with PCI >risers and RAID cards. Are you using a PCI riser card? Note that >LSI does *not* certify their cards to be used on risers if you are >custom building a machine. > Yes, there is a riser card. Wouldn't this imply that LSI is saying you can't use a 1U or a 2U box? It's kind of scary there is no end-to-end parity implemented somewhere along the whole data path to prevent this. It sort of defeats the point of RAID 6 and ECC. How did you determine this was the cause? >Do you mean a Serially-Attached SCSI aka SAS controller, I assume? No, it's SATA to SCSI. >Is this a custom build machine or a vendor integrated one? It is custom-built. > >Maurice Volaski wrote: >>In using drbd 8.0.5 recently, I have come across at least two >>instances where a bit on disk apparently flipped spontaneously in >>the ext3 metadata on volumes running on top of drbd. >> >>Also, I have been seeing regular corruption of a mysql database, >>which runs on top of drbd, and when I reported this as a bug since >>I also recently upgraded mysql versions, they question whether drbd >>could be responsible! >> >>All the volumes have been fscked recently and there were no >>reported errors. And, of course, there have been no errors reported >>from the underlying hardware. >> >>I have since upgraded to 8.0.6, but it's too early to say whether >>there is a change. >> >>I'm also seeing the backup server complain of not being files not >>comparing, though this may be a separate problem on the backup >>server. >> >> >> >>The ext-3 bit flipping: >>At 12:00 PM -0400 9/11/07, ext3-users-request at redhat.com wrote: >>>I have come across two files, essentially untouched in years, on two >>>different ext3 filesystems on the same server, Gentoo AMD 64-bit with >>>kernel 2.6.22 and fsck version 1.40.2 currently, spontaneously >>>becoming supremely large: >>> >>>Filesystem one >>>Inode 16257874, i_size is 18014398562775391, should be 53297152 >>> >>>Filesystem two >>>Inode 2121855, i_size is 35184386120704, should be 14032896. >>> >>>Both were discovered during an ordinary backup operation (via EMC >>>Insiginia's Retrospect Linux client). >>> >>>The backup runs daily and so one day, one file must have grew >>>spontaneously to this size and then on another day, it happened to >>>the second file, which is on a second filesystem. The backup attempt >>>generated repeated errors: >>> >>>EXT3-fs warning (device dm-2): ext3_block_to_path: block > big >>> >>>Both filesystems are running on different logical volumes, but >>>underlying that is are drbd network raid devices and underlying that >>>is a RAID 6-based SATA disk array. >> >> >> >>The answer to the bug report regarding mysql data corruption, who >>is blaming drbd! >>>http://bugs.mysql.com/?id=31038 >>> >>> Updated by: Heikki Tuuri >>> Reported by: Maurice Volaski >>> Category: Server: InnoDB >>> Severity: S2 (Serious) >>> Status: Open >>> Version: 5.0.48 >>> OS: Linux >>> OS Details: Gentoo >>> Tags: database page corruption locking up corrupt doublewrite >>> >>>[17 Sep 18:49] Heikki Tuuri >>> >>>Maurice, my first guess is to suspect the RAID-1 driver. >> >> >>My initial report of mysql data corruption: >>>>A 64-bit Gentoo Linux box had just been upgraded from MySQL 4.1 >>>>to5.0.44 fresh (by dumping in 4.1 and restoring in 5.0.44) and >>>>almostimmediately after that, during which time the database was >>>>not used,a crash occurred during a scripted mysqldump. So I >>>>restored and dayslater, it happened again. The crash details seem >>>>to be trying tosuggest some other aspect of the operating system, >>>>even the memoryor disk is flipping a bit. Or could I be running >>>>into a bug in thisversion of MySQL? >>>> >>>>Here's the output of the crash >>>>----------------------------------- >>>>InnoDB: Database page corruption on disk or a failed >>>>InnoDB: file read of page 533. >>>>InnoDB: You may have to recover from a backup. >>>>070827 3:10:04 InnoDB: Page dump in ascii and hex (16384 bytes): >>>> len 16384; hex >>>> >>>>[dump itself deleted >>>>forbrevity] >>>> >>>> >>>> >>>> >>>> ;InnoDB: End of page dump >>>>070827 3:10:04 InnoDB: Page checksum >>>>646563254,prior-to-4.0.14-form checksum 2415947328 >>>>InnoDB: stored checksum 4187530870, prior-to-4.0.14-form >>>>storedchecksum 2415947328 >>>>InnoDB: Page lsn 0 4409041, low 4 bytes of lsn at page end 4409041 >>>>InnoDB: Page number (if stored to page already) 533, >>>>InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0 >>>>InnoDB: Page may be an index page where index id is 0 35 >>>>InnoDB: (index PRIMARY of table elegance/image) >>>>InnoDB: Database page corruption on disk or a failed >>>>InnoDB: file read of page 533. >>>>InnoDB: You may have to recover from a backup. >>>>InnoDB: It is also possible that your operating >>>>InnoDB: system has corrupted its own file cache >>>>InnoDB: and rebooting your computer removes the >>>>InnoDB: error. >>>>InnoDB: If the corrupt page is an index page >>>>InnoDB: you can also try to fix the corruption >>>>InnoDB: by dumping, dropping, and reimporting >>>>InnoDB: the corrupt table. You can use CHECK >>>>InnoDB: TABLE to scan your table for corruption. >>>>InnoDB: See also >>>>InnoDB:http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html >>>>InnoDB: about forcing recovery. >>>InnoDB: Ending processing because of a corrupt database page. >> > >-- >high performance mysql consulting >www.provenscaling.com -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University