[DRBD-user] "syncer" crash when doing full resync

Kohari, Moiz mkohari at enterasys.com
Thu Sep 23 16:55:19 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Folks,

Thank you for a healthy discussion.  I really appreciate your help.

First, the metadata scheme in 0.7 where it requires a 128M dedicated partition instead of a small file in /var/lib/drbd is the reason we are still using the 0.6.x release.  Is there an easy way around this feature, so we can use 0.7 release?

Second, we see the memory corruption problem on multiple machines as well as inside our simulation on UML.  In our case the problem is probably not related to bad RAM (statistically improbable).  In our case, its either random kernel memory corruption from somewhere in the kernel, or a bug in DRBD.  Considering that it happens in the *exact* same spot in the drbd driver every time for us and other users on this drbd list, the corruption has a probability of occurring within drbd?

I am running the kernel with some light instrumentation now and I will update you if I find anything more.  In the mean time, if you have any other suggestions please feel free to guide me.

Best Regards,
Moiz Kohari


-----Original Message-----
From: drbd-user-bounces at linbit.com [mailto:drbd-user-bounces at linbit.com] On Behalf Of Lars Marowsky-Bree
Sent: Thursday, September 23, 2004 9:45 AM
To: Philipp Reisner; drbd-user at linbit.com
Subject: Re: [DRBD-user] "syncer" crash when doing full resync

On 2004-09-23T15:30:09,
   Philipp Reisner <philipp.reisner at linbit.com> said:

> *  You have to fix the RAM of the secondary machine.
>    (In case you really have the issue you reference to)
>    Maybe you should post that OOPS ...
>    
>    The item 5 of drbd-0.8 roadmap will have the effect that the
>    primary node will write something like "Got invalid block-id"
>    to its Syslog and disconnect. 

Yes, but with memory corruption, there's no telling what the secondary
might be writing to disk, or returning for a read. It's good to try and
fortify the primary, but:

>    --> You have to fix the RAM of the secondary...

is really the only option.

Of course, I'd fully appreciate it if he sponsored the development of
that feature for 0.7 along with an online-consistency check of the two
mirrors ;-)


Sincerely,
    Lars Marowsky-Brée <lmb at suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user





More information about the drbd-user mailing list