[DRBD-user] mkfs on a drbd partition hangs in drbd_al_begin_io

Håkan Engblom zyber_cynic at hotmail.com
Thu May 3 16:09:04 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

If it is hanging because of a write-operation that does not complete, which 
might be the case, should it be hanging in spin_lock_irq(&mdev->al_lock) in 
that case (you said that it is certainly hagning in a spin_lock_irq in your 
answer before) ??? Shouldn't it rather be hanging in 
wait_for_completion(&al_work.event), if it is hanging because a write 
operation does not succeed ?

br Håkan Engblom

>From: Lars Ellenberg <lars.ellenberg at linbit.com>
>To: drbd-user at lists.linbit.com
>Subject: Re: [DRBD-user] mkfs on a drbd partition hangs in drbd_al_begin_io
>Date: Thu, 3 May 2007 15:19:52 +0200
>
>On Thu, May 03, 2007 at 02:55:30PM +0200, Håkan Engblom wrote:
> > Hi, See below.
> >
> >
> > br Håkan Engblom
> >
> >
> > >From: Lars Ellenberg <lars.ellenberg at linbit.com>
> > >To: drbd-user at lists.linbit.com
> > >Subject: Re: [DRBD-user] mkfs on a drbd partition hangs in 
>drbd_al_begin_io
> > >Date: Thu, 3 May 2007 14:09:32 +0200
> > >
> > >On Thu, May 03, 2007 at 01:23:19PM +0200, Håkan Engblom wrote:
> > >> Hi,
> > >>
> > >> Some background: drbd-version is 0.7.22, running on a Montavista 
>Linux
> > >> dirstribution  2.6.10_mvl4
> > >>
> > >> I've seen that sometimes when doing mkfs on a drbd-partition, the 
>system
> > >> seem to hang in a drbd-function in kernel-space.
> > >> The problem has been reported once before to this mailing-list, in
> > >February
> > >> 2006, a thread called "mkfs hangs with lastest drbd branch build and 
>FC4
> > >> kernel" (I thin it is the same problem) and it has also been observed 
>by
> > >> others (seen when searching for "drbd_al_begin_io hangs" in google)
> > >>
> > >> However I've not seen any soultion to the problem.
> > >>
> > >> So far what I've been able to establish that the process seem to hang 
>in
> > >> the dbrd-function mentioned above, and I also know that it hangs 640
> > >bytes
> > >> into the function. When looking at the source code of this function, 
>my
> > >> guess is that it hangs on "spin_lock_irq(&mdev->al_lock);".
> > >>
> > >> Is this a known problem and does anyone know of a soultion ?
> > >
> > >hanging in "spin_lock_irq" translates to a hard lockup of the machine.
> > >so, this is most likely not the correct guess.
> > It could be a faulty conclusion ofcourse, but if the mkfs-command never
> > returns to user-space (strace gives no output at all) and every time I 
>look
> > in /proc/<mkfs-PID>/wchan I can see that it is inside drbd_al_begin_io,
> > isn't it indicating that it is hung inside that function ? If it is 
>hanging
> > in that function, what else could it be if it is not in spin_lock_irq,
> > especially since it is 640 bytes into the function, and that seem to be
> > close to the end of the function ?
>
>well, if you can still access the box, and even strace things and stuff,
>then it is certainly hanging in a spin_lock_irq :)
>
>drbd_al_begin_io (sometimes, not always)
>needs to do a drbd meta data transaction.
>meaning it writes 512 bytes to the drbd meta data area,
>and only returns once this write is completed.
>if that write is never completed, well, it never returns.
>so aparently, for some reason, in your setup sometimes the lower level
>storage drivers decide to not complete this timely.
>
> > >what exactly are the symptoms of that "hang"?
> > The symptom, looking at it from a high level, is that the mkfs never
> > finishes. When doing strace on the process, it is also possible to see 
>that
> > nothing happens, it is stuck in the kernel.
>
> > >do the numbers in /proc/drbd move, still?
> > Don't know. I will check that the next time I see the problem.
> >
> > >
> > >can you reproduce this with some different kernel,
> > >preferably plain kernel.org?
> > Yes and no. Theoretically it would be possible, but I don't think I 
>would
> > get the time to do that from my project-manager. In addition to this, 
>the
> > problem is quite difficult to reproduce. It is seen sometimes when i do 
>an
> > initial install of my system, including creating new partition-tables, 
>and
> > formatting the drbd-partitions. But it is far from every time I see the
> > problem, it is seen maybe 1/10 times when I do a reinstall.
> > If I setup a limited tesst-environment to try to reproduce the fault, my
> > expirience in troubleshooting these kind of problems tells me that the
> > problem might not occur if the environment is scaled down. I could be
> > wrong, but it has happened several times before when I've had similar
> > problems.
>
>really, this is something in your setup.
>some misbehaving lower level storage driver would be my guess.
>maybe it would be possible to do a workaround within drbd.
>but it is not drbd's fault.
>
>--
>: Lars Ellenberg                            Tel +43-1-8178292-0  :
>: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
>: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
>__
>please use the "List-Reply" function of your email client.
>_______________________________________________
>drbd-user mailing list
>drbd-user at lists.linbit.com
>http://lists.linbit.com/mailman/listinfo/drbd-user

_________________________________________________________________
Styla lägenheten till ett högre pris 
http://alltombostad.msn.se/Inredning/_Artiklar_Inredning/Homestylat_i_praktiken_/




More information about the drbd-user mailing list