[DRBD-user] File corruption in drbd partition

putcha narayana putcha_laks at hotmail.com
Tue Sep 7 15:28:43 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.



Reply inline

Thanks and Regards
Lak

 

> Date: Tue, 7 Sep 2010 15:15:07 +0200
> From: lars.ellenberg at linbit.com
> To: drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] File corruption in drbd partition
> 
> On Tue, Sep 07, 2010 at 12:12:08PM +0000, putcha narayana wrote:
> > 
> > Thanks for responding,
> > 
> > 
> > 
> > FYI: I have ran stat command to get details of the files whose data is
> > seen criss-crossing. I mean content of one file is seen in another.
> > Snapshot enclosed at the end, when corruption occured.
> > 
> > Files which have an issue belong to same block, IO Block: 4096 
> 
> No, that is the file size in occupied blocks.
> 
> > Every corruption seen, content of /repl/firewall/sysconfig/iptables content is seen in /repl/snmpagent/data/snmpd.conf
> > 
> > 
> > 
> > How much is "few"?
> > 
> > Today After 12 failovers. Last run after 80 failovers similar corruption is seen.
> > 
> > 
> > What is the IO load?
> > 
> > Note exactly sure, When sigterm is received there are 2 processes which write config data to DRBD partition.
> > 
> > 
> > How do you trigger the failover?
> > 
> > using reboot command
> > 
> > 
> > DRBD version, kernel version, file system type?
> > 
> > DRBD-8.0.16, 2.6.14.7, EXT3-FS
> > 
> > 
> > Volatile caches involved?
> > 
> > NO
> > How often/when do you fsck?
> > 
> > Every time DRBD-GO-Primary script is called. Before mounting DRBD partition we invoke fsck -fy
> 
> That is you do
> primary; fsck /dev/drbd0; mount;
> in that order?

[[LAK]]: Yes 
> 
> The observerd corruption may be caused by a lot of things.
> DRBD (in that version) may have an issue.
> ext3 (in your kernel version) may have an issue.
> the generic write-out path (in your kernel version) may have an issue.
> fsck (resp. your version of fsck) may have an issue.
> probably many other things I cannot think of right now ;-)
> 
> I suggest to repeat your tests with
> * no drbd involved, simply reboot a single box the same way you do now,
> force fsck before the mount.

[[LAK]]: We have same setup, single board no drbd. Only diff is newer kernel. No corruption seen in that.


> * more recent kernel (and distribution?)
> * more recent DRBD version (8.3.8.1) in your current setup

[[LAK]]: Unfortunately we cannot do this immediately. 


> * more recent DRBD version with newer kernel (and distribution)
[[LAK]]: I have seen in the mailing list people reported, on DRBD-8.3.x "I shall become SyncTarget, but I am primary" 

            To this some on replied that DRBD network is being shutdown too fast.

            may be similar thing is happening in our case which eventually resulted in corruption (may be!). 

        Does DRBD provide any options to protect against such corruption, say partial writes,.... lock the disk for writes????

 

 


> To get additional data points.
> 
> > > Date: Tue, 7 Sep 2010 12:16:59 +0200
> > > From: lars.ellenberg at linbit.com
> > > To: drbd-user at lists.linbit.com
> > > Subject: Re: [DRBD-user] File corruption in drbd partition
> > > 
> > > On Tue, Sep 07, 2010 at 09:35:48AM +0000, putcha narayana wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > We are running continuous failovers on a redundant setup (Active / Standby).
> > > > After few failovers we observe content of file x appears inside file y.
> > > 
> > > How much is "few"?
> > > What is the IO load?
> > > How do you trigger the failover?
> > > DRBD version, kernel version, file system type?
> > > Volatile caches involved?
> > > How often/when do you fsck?
> > > 
> > > > In one particular case we observed inode corruption, when fsck command is run on /repl partition.
> > > > Multiply-claimed block(s) in inode 28: 1233 1249 1251 1252
> > > > Multiply-claimed block(s) in inode 1183: 1251 1252
> > > > Multiply-claimed block(s) in inode 1184: 1233
> > > > Multiply-claimed block(s) in inode 1185: 1249
> > > > 
> > > > When fsck -fy is run on /repl partition then the end result is content of file x is seen in file y.
> > > 
> > > 
> > > 
> > > -- 
> > > : Lars Ellenberg
> > > : LINBIT | Your Way to High Availability
> > > : DRBD/HA support and consulting http://www.linbit.com
> > > 
> > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> > > __
> > > please don't Cc me, but send to list -- I'm subscribed
> > > _______________________________________________
> > > drbd-user mailing list
> > > drbd-user at lists.linbit.com
> > > http://lists.linbit.com/mailman/listinfo/drbd-user
> > 
> 
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list -- I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100907/a4970204/attachment.htm>


More information about the drbd-user mailing list