Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Sep 07, 2010 at 12:12:08PM +0000, putcha narayana wrote: > > Thanks for responding, > > > > FYI: I have ran stat command to get details of the files whose data is > seen criss-crossing. I mean content of one file is seen in another. > Snapshot enclosed at the end, when corruption occured. > > Files which have an issue belong to same block, IO Block: 4096 No, that is the file size in occupied blocks. > Every corruption seen, content of /repl/firewall/sysconfig/iptables content is seen in /repl/snmpagent/data/snmpd.conf > > > > How much is "few"? > > Today After 12 failovers. Last run after 80 failovers similar corruption is seen. > > > What is the IO load? > > Note exactly sure, When sigterm is received there are 2 processes which write config data to DRBD partition. > > > How do you trigger the failover? > > using reboot command > > > DRBD version, kernel version, file system type? > > DRBD-8.0.16, 2.6.14.7, EXT3-FS > > > Volatile caches involved? > > NO > How often/when do you fsck? > > Every time DRBD-GO-Primary script is called. Before mounting DRBD partition we invoke fsck -fy That is you do primary; fsck /dev/drbd0; mount; in that order? The observerd corruption may be caused by a lot of things. DRBD (in that version) may have an issue. ext3 (in your kernel version) may have an issue. the generic write-out path (in your kernel version) may have an issue. fsck (resp. your version of fsck) may have an issue. probably many other things I cannot think of right now ;-) I suggest to repeat your tests with * no drbd involved, simply reboot a single box the same way you do now, force fsck before the mount. * more recent kernel (and distribution?) * more recent DRBD version (8.3.8.1) in your current setup * more recent DRBD version with newer kernel (and distribution) To get additional data points. > > Date: Tue, 7 Sep 2010 12:16:59 +0200 > > From: lars.ellenberg at linbit.com > > To: drbd-user at lists.linbit.com > > Subject: Re: [DRBD-user] File corruption in drbd partition > > > > On Tue, Sep 07, 2010 at 09:35:48AM +0000, putcha narayana wrote: > > > > > > Hi, > > > > > > We are running continuous failovers on a redundant setup (Active / Standby). > > > After few failovers we observe content of file x appears inside file y. > > > > How much is "few"? > > What is the IO load? > > How do you trigger the failover? > > DRBD version, kernel version, file system type? > > Volatile caches involved? > > How often/when do you fsck? > > > > > In one particular case we observed inode corruption, when fsck command is run on /repl partition. > > > Multiply-claimed block(s) in inode 28: 1233 1249 1251 1252 > > > Multiply-claimed block(s) in inode 1183: 1251 1252 > > > Multiply-claimed block(s) in inode 1184: 1233 > > > Multiply-claimed block(s) in inode 1185: 1249 > > > > > > When fsck -fy is run on /repl partition then the end result is content of file x is seen in file y. > > > > > > > > -- > > : Lars Ellenberg > > : LINBIT | Your Way to High Availability > > : DRBD/HA support and consulting http://www.linbit.com > > > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > > __ > > please don't Cc me, but send to list -- I'm subscribed > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com > > http://lists.linbit.com/mailman/listinfo/drbd-user > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed