[DRBD-user] Machine crashed repeatedly: drbd16: Epoch set size wrong!!found=1061 reported=1060

Lars Ellenberg Lars.Ellenberg at linbit.com
Sat Oct 30 01:31:37 CEST 2004

/ 2004-10-29 22:42:00 +0200
\ Andreas Hartmann:
> Hello!
> I wrote the same problem to the lkml. This is what Marcelo said:

well, he said nothing.

I doubt it is drbds fault. more likely some weird memory pressure thing,
or kernel code and gcc optimization does not like your xeons (would not
be the first time that weird kernel behaviour occurs with cpus/chipsets
that are "too new").

you may want to recompile your kernel with CONFIG_DEBUG_SLAB,
recompile your drbd with DBG_ALL_SYMBOLS, and save the module symbol
information (after you loaded drbd) for later reference with ksymoops.
then see if you can reproduce the event.

the stack trace you provide is pretty boring^W uninformative, it just
tells that kswapd thought it wants to shrink a cache (thats its job, it
tries to free pages), and that kmem_cache_reap obviously tried to
dereference some "next" pointer, that happend to point to 0xffffffff.

which pointer, which slab, which page, which process, why it was set
that way, why and when this might have happend, who did it...
all pure guesswork.

but just in case:
do you see other log messages that may be drbd related?

if not, I cannot help you.  if yes, I probably still can not help you,
but at least I could try to.

	Lars Ellenberg

