[DRBD-user] Re: filesystem corruptions

Tue Oct 11 11:42:43 CEST 2005

On Tuesday 11 October 2005 10:47, Lars Ellenberg wrote:
> / 2005-10-10 23:51:11 +0200
>
> \ Bernd Schubert:
> > On Monday 10 October 2005 22:10, Eugene Crosser wrote:
> > > Just out of the blue: could the problem be in the drbd vs. md
> > > interaction, rather than SATA, DMA, IRQ magic?
> >
> > Oh oh, I'm a bit scared now. Just a story from our experience:
> >
> > After we installed our new file-server in summer 2004 we experienced some
> > problems after it was already in production for a couple of hours and
> > after all clients already used it. Therefore we had to introduce a
> > workaround (by switching back from unfs3 to clusternfs) and had to put
> > the clients /etc and /var on an extra ext2 partitions on our file-server
> > (all clients are diskless and get everything via tftpboot and nfs). Since
> > the hardware raid already was in usage and since we didn't use LVM, we
> > had to put those two drbd partitions on a software-raid1 on SATA disks.
> > After some time we again and again experienced file corruptions of the
> > /var partition. Sometime after all unfs3 problems were fixed we took the
> > next chance and got rid of the extra ext2 partitions. Actually we all the
> > time thought its the onboard SIL controller that the causes the problems
> > and also already replaced it by another controller on the same time when
> > we removed the extra /var and /etc partitions.
> > Well, the point is, we never experienced any data corruption with drbd on
> > the hardware raid. On the failover node we had until some weeks ago only
> > a single ide disk, but now replaced it by a software-raid1 - but thats on
> > the failover node, which is supposed to work for a little time as
> > possible.
> > The bad news (at least for us) is that our main-server (which had the
> > hardware raid) will go into repair on Wednesday and we will have rely on
> > the failover-system. I guess next weeks this time I can tell if there has
> > been some data corruption or not.
>
> now. you are aware that you will experience all sorts of strange
> behaviour, up to remount-ro because of some unexpected inodes,
> if you change the nfs server setup below running clients?
> and that has nothing to do with drbd, again...

Sorry, I didn't express myself properly. Of course, we had to reboot all 
clients when the nfs-servers were exchanged - no chance to do this on the fly 
with a nfs-root-filesystem.
What I mean with filesystem corruption is that the servers kernel from time to 
time wrote several typical ext2 error messages in its logs. Also pretty funny 
that time was, that the drbd data on the main server and those on the 
failover node differed. Running e2fsck on both nodes (disconnected and both 
in primary, of course) showed different error messages. Actually, e2fsck 
sometimes didn't find any problems on the failover node, but always on the 
main server. That was also the reason why we believed its a problem of the 
SATA controller...

Cheers,
	Bernd

-- 
Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg