[DRBD-user] Greater then 4TB

Philipp Reisner philipp.reisner at linbit.com
Tue Jul 15 15:34:47 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Am Dienstag, 15. Juli 2008 14:50:52 schrieb nathan at robotics.net:
> On Tue, 15 Jul 2008, Philipp Reisner wrote:
> > Hi Nathan,
> >
> > Thanks for typing that Call trace... Here is an excerpt from
> > drbd_bitmap.c with the line marked where the crash happened.
> >
> > int drbd_bm_test_bit(struct drbd_conf *mdev, const unsigned long bitnr)
> > {
> > 	unsigned long flags;
> > 	struct drbd_bitmap *b = mdev->bitmap;
> > 	int i;
> > 	ERR_IF(!b) return 0;
> > 	ERR_IF(!b->bm) return 0;
> >
> > 	spin_lock_irqsave(&b->bm_lock, flags);
> > 	if (bitnr < b->bm_bits) {
> > 		i = test_bit(bitnr, b->bm) ? 1 : 0; <=<<==<<<===<<<<====<<<<<===== HERE
> > 	} else if (bitnr == b->bm_bits) {
> > 		i = -1;
> > 	} else { /* (bitnr > b->bm_bits) */
> > 		ERR("bitnr=%lu > bm_bits=%lu\n", bitnr, b->bm_bits);
> > 		i = 0;
> > 	}
> >
> > 	spin_unlock_irqrestore(&b->bm_lock, flags);
> > 	return i;
> > }
> >
> > You are right with that we should print a nice error message saying
> > that something went wrong with the allocation of the bitmap instead of
> > OOPSing in that case. As far as I know we do that.
>
> So is it crashing because the set is larger then 4TB or some other reason?
>

Yes, my guess is that it has something todo with that.

> > The question here is, why does it not abort with a failed bitmap
> > allocation ? Can you provide us the kernel log from just before the crash
> > ?
>
> Absolutely nothing in logs. : (
>
> > Was the resync already running for some time, or does it crash
> > instantaneously ?
>
> It can happen almost instantaneously, it can take 4 sec, it can take 30
> sec. I thought it had something to do with the since because it did run
> without problem for almost 24 hours. Now I am in this less then 30 sec
> mode.
>

That is really strange.

> > Are there any chances that you could also proved the upper part of the
> > OOPS ?
>
> I am open to ways to get that, just don't know how. Since nothing is in
> the logs, I had to get that by doing a screen shot of the display via the
> IPMI card and manually type all that in.
>

I do not know how your IPMI card works, but can you connect to that IPMI
card before, and have then a full log with history ? maybe with the
screen command ?

Often we recommend to use serial null modem cables or serial to USB
adapters. Then start the machine that is expected to crash and append
to the kernel command line something like 
"console=tty0 console=ttyS1,57600"

Then you can capture the complete crash by using the screen command
on the other end of the serial cable.

> > Nathan, I do not want to create the impression that it will work for you
> > if you help us to fix this. Probably it will then fail for you with a
> > nice error message in the kernel log saying that the allocation of the
> > bitmap failed...
>
> I am open to helping you guys fix this regardless! Please let me know
> anything I can do to help. Also, I am new to git, but I pulled the latest
> source and have the same problem.
>

Since 8.2.6 there are quite a number of commits in GIT but nothing was
done in that area.

Maybe you could cross check, if that happens with a device that is 
slightly smaller than 4TB as well ?

-phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :



More information about the drbd-user mailing list