Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, 15 Jul 2008, Philipp Reisner wrote:
> Hi Nathan,
>
> Thanks for typing that Call trace... Here is an excerpt from drbd_bitmap.c
> with the line marked where the crash happened.
>
> int drbd_bm_test_bit(struct drbd_conf *mdev, const unsigned long bitnr)
> {
> unsigned long flags;
> struct drbd_bitmap *b = mdev->bitmap;
> int i;
> ERR_IF(!b) return 0;
> ERR_IF(!b->bm) return 0;
>
> spin_lock_irqsave(&b->bm_lock, flags);
> if (bitnr < b->bm_bits) {
> i = test_bit(bitnr, b->bm) ? 1 : 0; <=<<==<<<===<<<<====<<<<<===== HERE
> } else if (bitnr == b->bm_bits) {
> i = -1;
> } else { /* (bitnr > b->bm_bits) */
> ERR("bitnr=%lu > bm_bits=%lu\n", bitnr, b->bm_bits);
> i = 0;
> }
>
> spin_unlock_irqrestore(&b->bm_lock, flags);
> return i;
> }
>
> You are right with that we should print a nice error message saying
> that something went wrong with the allocation of the bitmap instead of OOPSing
> in that case. As far as I know we do that.
So is it crashing because the set is larger then 4TB or some other reason?
> The question here is, why does it not abort with a failed bitmap allocation ?
> Can you provide us the kernel log from just before the crash ?
Absolutely nothing in logs. : (
> Was the resync already running for some time, or does it crash instantaneously ?
It can happen almost instantaneously, it can take 4 sec, it can take 30
sec. I thought it had something to do with the since because it did run
without problem for almost 24 hours. Now I am in this less then 30 sec
mode.
> Are there any chances that you could also proved the upper part of the OOPS ?
I am open to ways to get that, just don't know how. Since nothing is in
the logs, I had to get that by doing a screen shot of the display via the
IPMI card and manually type all that in.
> Nathan, I do not want to create the impression that it will work for you if
> you help us to fix this. Probably it will then fail for you with a nice
> error message in the kernel log saying that the allocation of the bitmap
> failed...
I am open to helping you guys fix this regardless! Please let me know
anything I can do to help. Also, I am new to git, but I pulled the latest
source and have the same problem.
><>
Nathan Stratton CTO, BlinkMind, Inc.
nathan at robotics.net nathan at blinkmind.com
http://www.robotics.net http://www.blinkmind.com