Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, If it is hanging because of a write-operation that does not complete, which might be the case, should it be hanging in spin_lock_irq(&mdev->al_lock) in that case (you said that it is certainly hagning in a spin_lock_irq in your answer before) ??? Shouldn't it rather be hanging in wait_for_completion(&al_work.event), if it is hanging because a write operation does not succeed ? br Håkan Engblom >From: Lars Ellenberg <lars.ellenberg at linbit.com> >To: drbd-user at lists.linbit.com >Subject: Re: [DRBD-user] mkfs on a drbd partition hangs in drbd_al_begin_io >Date: Thu, 3 May 2007 15:19:52 +0200 > >On Thu, May 03, 2007 at 02:55:30PM +0200, Håkan Engblom wrote: > > Hi, See below. > > > > > > br Håkan Engblom > > > > > > >From: Lars Ellenberg <lars.ellenberg at linbit.com> > > >To: drbd-user at lists.linbit.com > > >Subject: Re: [DRBD-user] mkfs on a drbd partition hangs in >drbd_al_begin_io > > >Date: Thu, 3 May 2007 14:09:32 +0200 > > > > > >On Thu, May 03, 2007 at 01:23:19PM +0200, Håkan Engblom wrote: > > >> Hi, > > >> > > >> Some background: drbd-version is 0.7.22, running on a Montavista >Linux > > >> dirstribution 2.6.10_mvl4 > > >> > > >> I've seen that sometimes when doing mkfs on a drbd-partition, the >system > > >> seem to hang in a drbd-function in kernel-space. > > >> The problem has been reported once before to this mailing-list, in > > >February > > >> 2006, a thread called "mkfs hangs with lastest drbd branch build and >FC4 > > >> kernel" (I thin it is the same problem) and it has also been observed >by > > >> others (seen when searching for "drbd_al_begin_io hangs" in google) > > >> > > >> However I've not seen any soultion to the problem. > > >> > > >> So far what I've been able to establish that the process seem to hang >in > > >> the dbrd-function mentioned above, and I also know that it hangs 640 > > >bytes > > >> into the function. When looking at the source code of this function, >my > > >> guess is that it hangs on "spin_lock_irq(&mdev->al_lock);". > > >> > > >> Is this a known problem and does anyone know of a soultion ? > > > > > >hanging in "spin_lock_irq" translates to a hard lockup of the machine. > > >so, this is most likely not the correct guess. > > It could be a faulty conclusion ofcourse, but if the mkfs-command never > > returns to user-space (strace gives no output at all) and every time I >look > > in /proc/<mkfs-PID>/wchan I can see that it is inside drbd_al_begin_io, > > isn't it indicating that it is hung inside that function ? If it is >hanging > > in that function, what else could it be if it is not in spin_lock_irq, > > especially since it is 640 bytes into the function, and that seem to be > > close to the end of the function ? > >well, if you can still access the box, and even strace things and stuff, >then it is certainly hanging in a spin_lock_irq :) > >drbd_al_begin_io (sometimes, not always) >needs to do a drbd meta data transaction. >meaning it writes 512 bytes to the drbd meta data area, >and only returns once this write is completed. >if that write is never completed, well, it never returns. >so aparently, for some reason, in your setup sometimes the lower level >storage drivers decide to not complete this timely. > > > >what exactly are the symptoms of that "hang"? > > The symptom, looking at it from a high level, is that the mkfs never > > finishes. When doing strace on the process, it is also possible to see >that > > nothing happens, it is stuck in the kernel. > > > >do the numbers in /proc/drbd move, still? > > Don't know. I will check that the next time I see the problem. > > > > > > > >can you reproduce this with some different kernel, > > >preferably plain kernel.org? > > Yes and no. Theoretically it would be possible, but I don't think I >would > > get the time to do that from my project-manager. In addition to this, >the > > problem is quite difficult to reproduce. It is seen sometimes when i do >an > > initial install of my system, including creating new partition-tables, >and > > formatting the drbd-partitions. But it is far from every time I see the > > problem, it is seen maybe 1/10 times when I do a reinstall. > > If I setup a limited tesst-environment to try to reproduce the fault, my > > expirience in troubleshooting these kind of problems tells me that the > > problem might not occur if the environment is scaled down. I could be > > wrong, but it has happened several times before when I've had similar > > problems. > >really, this is something in your setup. >some misbehaving lower level storage driver would be my guess. >maybe it would be possible to do a workaround within drbd. >but it is not drbd's fault. > >-- >: Lars Ellenberg Tel +43-1-8178292-0 : >: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : >: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : >__ >please use the "List-Reply" function of your email client. >_______________________________________________ >drbd-user mailing list >drbd-user at lists.linbit.com >http://lists.linbit.com/mailman/listinfo/drbd-user _________________________________________________________________ Styla lägenheten till ett högre pris http://alltombostad.msn.se/Inredning/_Artiklar_Inredning/Homestylat_i_praktiken_/