<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style>
</head>
<body class='hmmessage'>
<BR>Reply inline<BR>
Thanks and Regards<BR>Lak<BR><BR> <BR>
> Date: Tue, 7 Sep 2010 15:15:07 +0200<BR>> From: lars.ellenberg@linbit.com<BR>> To: drbd-user@lists.linbit.com<BR>> Subject: Re: [DRBD-user] File corruption in drbd partition<BR>> <BR>> On Tue, Sep 07, 2010 at 12:12:08PM +0000, putcha narayana wrote:<BR>> > <BR>> > Thanks for responding,<BR>> > <BR>> > <BR>> > <BR>> > FYI: I have ran stat command to get details of the files whose data is<BR>> > seen criss-crossing. I mean content of one file is seen in another.<BR>> > Snapshot enclosed at the end, when corruption occured.<BR>> > <BR>> > Files which have an issue belong to same block, IO Block: 4096 <BR>> <BR>> No, that is the file size in occupied blocks.<BR>> <BR>> > Every corruption seen, content of /repl/firewall/sysconfig/iptables content is seen in /repl/snmpagent/data/snmpd.conf<BR>> > <BR>> > <BR>> > <BR>> > How much is "few"?<BR>> > <BR>> > Today After 12 failovers. Last run after 80 failovers similar corruption is seen.<BR>> > <BR>> > <BR>> > What is the IO load?<BR>> > <BR>> > Note exactly sure, When sigterm is received there are 2 processes which write config data to DRBD partition.<BR>> > <BR>> > <BR>> > How do you trigger the failover?<BR>> > <BR>> > using reboot command<BR>> > <BR>> > <BR>> > DRBD version, kernel version, file system type?<BR>> > <BR>> > DRBD-8.0.16, 2.6.14.7, EXT3-FS<BR>> > <BR>> > <BR>> > Volatile caches involved?<BR>> > <BR>> > NO<BR>> > How often/when do you fsck?<BR>> > <BR>> > Every time DRBD-GO-Primary script is called. Before mounting DRBD partition we invoke fsck -fy<BR>> <BR>> That is you do<BR>> primary; fsck /dev/drbd0; mount;<BR>> in that order?<BR>
[[LAK]]: Yes <BR>> <BR>> The observerd corruption may be caused by a lot of things.<BR>> DRBD (in that version) may have an issue.<BR>> ext3 (in your kernel version) may have an issue.<BR>> the generic write-out path (in your kernel version) may have an issue.<BR>> fsck (resp. your version of fsck) may have an issue.<BR>> probably many other things I cannot think of right now ;-)<BR>> <BR>> I suggest to repeat your tests with<BR>> * no drbd involved, simply reboot a single box the same way you do now,<BR>> force fsck before the mount.<BR>
[[LAK]]: We have same setup, single board no drbd. Only diff is newer kernel. No corruption seen in that.<BR>
<BR>> * more recent kernel (and distribution?)<BR>> * more recent DRBD version (8.3.8.1) in your current setup<BR>
[[LAK]]: Unfortunately we cannot do this immediately. <BR>
<BR>> * more recent DRBD version with newer kernel (and distribution)<BR>[[LAK]]: I have seen in the mailing list people reported, on DRBD-8.3.x "<B><FONT color=#0000ff size=2><FONT color=#0000ff size=2>I shall become SyncTarget, but I am primary" <BR>
</B></FONT></FONT> To this some on replied that DRBD network is being shutdown too fast.<BR>
may be similar thing is happening in our case which eventually resulted in corruption (may be!). <BR>
Does DRBD provide any options to protect against such corruption, say partial writes,.... lock the disk for writes????<BR>
<BR>
<BR>
<BR>> To get additional data points.<BR>> <BR>> > > Date: Tue, 7 Sep 2010 12:16:59 +0200<BR>> > > From: lars.ellenberg@linbit.com<BR>> > > To: drbd-user@lists.linbit.com<BR>> > > Subject: Re: [DRBD-user] File corruption in drbd partition<BR>> > > <BR>> > > On Tue, Sep 07, 2010 at 09:35:48AM +0000, putcha narayana wrote:<BR>> > > > <BR>> > > > Hi,<BR>> > > > <BR>> > > > We are running continuous failovers on a redundant setup (Active / Standby).<BR>> > > > After few failovers we observe content of file x appears inside file y.<BR>> > > <BR>> > > How much is "few"?<BR>> > > What is the IO load?<BR>> > > How do you trigger the failover?<BR>> > > DRBD version, kernel version, file system type?<BR>> > > Volatile caches involved?<BR>> > > How often/when do you fsck?<BR>> > > <BR>> > > > In one particular case we observed inode corruption, when fsck command is run on /repl partition.<BR>> > > > Multiply-claimed block(s) in inode 28: 1233 1249 1251 1252<BR>> > > > Multiply-claimed block(s) in inode 1183: 1251 1252<BR>> > > > Multiply-claimed block(s) in inode 1184: 1233<BR>> > > > Multiply-claimed block(s) in inode 1185: 1249<BR>> > > > <BR>> > > > When fsck -fy is run on /repl partition then the end result is content of file x is seen in file y.<BR>> > > <BR>> > > <BR>> > > <BR>> > > -- <BR>> > > : Lars Ellenberg<BR>> > > : LINBIT | Your Way to High Availability<BR>> > > : DRBD/HA support and consulting http://www.linbit.com<BR>> > > <BR>> > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.<BR>> > > __<BR>> > > please don't Cc me, but send to list -- I'm subscribed<BR>> > > _______________________________________________<BR>> > > drbd-user mailing list<BR>> > > drbd-user@lists.linbit.com<BR>> > > http://lists.linbit.com/mailman/listinfo/drbd-user<BR>> > <BR>> <BR>> > _______________________________________________<BR>> > drbd-user mailing list<BR>> > drbd-user@lists.linbit.com<BR>> > http://lists.linbit.com/mailman/listinfo/drbd-user<BR>> <BR>> <BR>> -- <BR>> : Lars Ellenberg<BR>> : LINBIT | Your Way to High Availability<BR>> : DRBD/HA support and consulting http://www.linbit.com<BR>> <BR>> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.<BR>> __<BR>> please don't Cc me, but send to list -- I'm subscribed<BR>> _______________________________________________<BR>> drbd-user mailing list<BR>> drbd-user@lists.linbit.com<BR>> http://lists.linbit.com/mailman/listinfo/drbd-user<BR><BR>                                            </body>
</html>