[DRBD-user] Greater then 4TB

nathan at robotics.net nathan at robotics.net
Tue Jul 15 14:50:52 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, 15 Jul 2008, Philipp Reisner wrote:

> Hi Nathan,
>
> Thanks for typing that Call trace... Here is an excerpt from drbd_bitmap.c
> with the line marked where the crash happened.
>
> int drbd_bm_test_bit(struct drbd_conf *mdev, const unsigned long bitnr)
> {
> 	unsigned long flags;
> 	struct drbd_bitmap *b = mdev->bitmap;
> 	int i;
> 	ERR_IF(!b) return 0;
> 	ERR_IF(!b->bm) return 0;
>
> 	spin_lock_irqsave(&b->bm_lock, flags);
> 	if (bitnr < b->bm_bits) {
> 		i = test_bit(bitnr, b->bm) ? 1 : 0; <=<<==<<<===<<<<====<<<<<===== HERE
> 	} else if (bitnr == b->bm_bits) {
> 		i = -1;
> 	} else { /* (bitnr > b->bm_bits) */
> 		ERR("bitnr=%lu > bm_bits=%lu\n", bitnr, b->bm_bits);
> 		i = 0;
> 	}
>
> 	spin_unlock_irqrestore(&b->bm_lock, flags);
> 	return i;
> }
>
> You are right with that we should print a nice error message saying
> that something went wrong with the allocation of the bitmap instead of OOPSing
> in that case. As far as I know we do that.

So is it crashing because the set is larger then 4TB or some other reason?

> The question here is, why does it not abort with a failed bitmap allocation ?
> Can you provide us the kernel log from just before the crash ?

Absolutely nothing in logs. : (

> Was the resync already running for some time, or does it crash instantaneously ?

It can happen almost instantaneously, it can take 4 sec, it can take 30 
sec. I thought it had something to do with the since because it did run 
without problem for almost 24 hours. Now I am in this less then 30 sec 
mode.

> Are there any chances that you could also proved the upper part of the OOPS ?

I am open to ways to get that, just don't know how. Since nothing is in 
the logs, I had to get that by doing a screen shot of the display via the 
IPMI card and manually type all that in.

> Nathan, I do not want to create the impression that it will work for you if
> you help us to fix this. Probably it will then fail for you with a nice
> error message in the kernel log saying that the allocation of the bitmap
> failed...

I am open to helping you guys fix this regardless! Please let me know 
anything I can do to help. Also, I am new to git, but I pulled the latest 
source and have the same problem.

><>
Nathan Stratton                                CTO, BlinkMind, Inc.
nathan at robotics.net                         nathan at blinkmind.com
http://www.robotics.net                        http://www.blinkmind.com



More information about the drbd-user mailing list