Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, See below. br Håkan Engblom >From: Lars Ellenberg <lars.ellenberg at linbit.com> >To: drbd-user at lists.linbit.com >Subject: Re: [DRBD-user] mkfs on a drbd partition hangs in drbd_al_begin_io >Date: Thu, 3 May 2007 14:09:32 +0200 > >On Thu, May 03, 2007 at 01:23:19PM +0200, Håkan Engblom wrote: > > Hi, > > > > Some background: drbd-version is 0.7.22, running on a Montavista Linux > > dirstribution 2.6.10_mvl4 > > > > I've seen that sometimes when doing mkfs on a drbd-partition, the system > > seem to hang in a drbd-function in kernel-space. > > The problem has been reported once before to this mailing-list, in >February > > 2006, a thread called "mkfs hangs with lastest drbd branch build and FC4 > > kernel" (I thin it is the same problem) and it has also been observed by > > others (seen when searching for "drbd_al_begin_io hangs" in google) > > > > However I've not seen any soultion to the problem. > > > > So far what I've been able to establish that the process seem to hang in > > the dbrd-function mentioned above, and I also know that it hangs 640 >bytes > > into the function. When looking at the source code of this function, my > > guess is that it hangs on "spin_lock_irq(&mdev->al_lock);". > > > > Is this a known problem and does anyone know of a soultion ? > >hanging in "spin_lock_irq" translates to a hard lockup of the machine. >so, this is most likely not the correct guess. It could be a faulty conclusion ofcourse, but if the mkfs-command never returns to user-space (strace gives no output at all) and every time I look in /proc/<mkfs-PID>/wchan I can see that it is inside drbd_al_begin_io, isn't it indicating that it is hung inside that function ? If it is hanging in that function, what else could it be if it is not in spin_lock_irq, especially since it is 640 bytes into the function, and that seem to be close to the end of the function ? > >what exactly are the symptoms of that "hang"? The symptom, looking at it from a high level, is that the mkfs never finishes. When doing strace on the process, it is also possible to see that nothing happens, it is stuck in the kernel. >do the numbers in /proc/drbd move, still? Don't know. I will check that the next time I see the problem. > >can you reproduce this with some different kernel, >preferably plain kernel.org? Yes and no. Theoretically it would be possible, but I don't think I would get the time to do that from my project-manager. In addition to this, the problem is quite difficult to reproduce. It is seen sometimes when i do an initial install of my system, including creating new partition-tables, and formatting the drbd-partitions. But it is far from every time I see the problem, it is seen maybe 1/10 times when I do a reinstall. If I setup a limited tesst-environment to try to reproduce the fault, my expirience in troubleshooting these kind of problems tells me that the problem might not occur if the environment is scaled down. I could be wrong, but it has happened several times before when I've had similar problems. > >does it hang only when "Connected" or also when "StandAlone"? Don't know. So far it has always been seen imedialtely after drbd has been started, and thus the state has been either SyncSource or possibly PausedSyncS, so is has contact with the secondary node, but it is not fully syncronised. In the system we have three drbd-partitions, and they are formated sequentially, one after the other. The system can hang during formatting of any of these three partitions. > >does running "while true; do sync; usleep 1; done" help? > when run on the Primary? > Secondary? > both? Don't know. I can try it the next time I see the problem. > >is this on a software raid? The partition we use below drbd (/dev/sda...) is an ordinary sas-disk. We use drbd to get server-redundancy. No additional software is used for mirroring. >does it help doing this without software raid? > > >-- >: Lars Ellenberg Tel +43-1-8178292-0 : >: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : >: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : >__ >please use the "List-Reply" function of your email client. >_______________________________________________ >drbd-user mailing list >drbd-user at lists.linbit.com >http://lists.linbit.com/mailman/listinfo/drbd-user _________________________________________________________________ Fräscha middagstips på MSN http://arla.msn.se/