Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, On Friday 29 October 2004 17:55, you wrote: > Hello Andreas, thanks for the personal mail but keeping the problem discussion on the list makes it easier for all. Maybe it was by accident, so I hope you don't mind the CC. > Andreas Huck schrieb: > > Hi, > > > > On Friday 29 October 2004 11:41, Andreas Hartmann wrote: > >> Hello all, > >> > >> I'm wondering about this message, which occured with drbd 0.6.13 running > >> with original kernel 2.4.27 on a XSeries 235 machine with serveraid 5, > >> broadcom gigabit ethernet (bcm5700) during copying datas to /dev/nb16. > >> The machine has 512 MB RAM and 1024 MB cache. > >> > >> What does this mean: > >> Oct 29 05:30:20 FAGINTSC kernel: drbd16: Epoch set size > >> wrong!!found=1061 reported=1060 > > > > I recall having read something simmilar here, > > google(drbd "Epoch set size wrong") comes > > up with > > http://lists.linbit.com/pipermail/drbd-user/2004-May/000796.html > > http://lists.linbit.com/pipermail/drbd-user/2004-May/000797.html > > I found this too, but I couldn't find any answer concerning epoch size > itself. The respond speaks about performance problems and the tl-size or > snd-buf size (drbd docu says: these options are for protocol A - but I'm > using C). Additionally: I cannot detect any performance problem on both of > my machines. Lars gave some hints meanwhile. Claiming DRBD beeing too strong for your hardware sounds nice :-) - Did the system die at low load? - Can you _trigger_ the failure by increasing system load (bonnie etc.), parallel resync or anything? Try to crash it not by coincidence once a day but 3 minutes after some action. Well, easy to talk about ... - Is the failure caused by both systems regardless of which one was primary? - "original" kernel means vanilla? Can you reproduce it with a different kernel? E.g. RH or SLES8-SP3 (just to verify). - You define "minor_count=40", but the error occurs as well with one device only, I guess. - If you have 2x256MB memory each, try with kernel parameter "mem=256k" (and avoid swapping) to detect faulty memory. > > That's why I'm asking here, hoping to get an answer to the epoch size > problem: what does it mean? Is there any switch to control it? I couldn't > find any hint in the docu. Are there any dependencies between tl-size / > sndbuf-size and epoch size? I would be very glad, if somebody could > explain the dependencies because I want to understand it and I must get > rid of the crashes :-). see comments from Lars. Good luck, Andreas