Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have found a way to consistently cause the hang with 2.6.14-1.1656_FC4 & drbd, and I believe I have enough results to say the fault is related to Neil Brown's "reduce stack consumption" patch. I say related because I don't know enough to determine if Neil's patch is 'rude & evil' :) or if DRBD just happens to exercise the bio's in a way that his email[2] indicates are "unsafe". The question I still have now is: Is this a problem that should be fixed in the kernel, i.e., Neil or others resubmit the patch again, or is it some unsafeness in DRBD which must be fixed? (Finger pointing can now begin :) setup: The two machines have been configured and the test machine has had `drbdadm -- --do-what-I-say primary test0` ran on it, then the machines were allowed to sync. Method for each kernel under test: Boot to the kernel you want to test, login to a virtual console (not X), then as root run the attached lock_drbd_internal_Synced script. Results: with 2.6.14-1.1656_FC4smp & 2.6.14-1.1656_FC4, the command locks at "Writing superblocks and filesystem accounting information:" with No return. The system is still active though, as other commands in other virtual consoles. {now press the power or reset button until it is ready to reboot.} with 2.6.14-1.1656_FC4.tdennistsmp & 2.6.14-1.1656_FC4.tdennist (which are the same as the 2.6.14-1.1656_FC4 ones but without Neil Brown's patch)[1], the command completes in ~3 minutes with a 1GB partition. software versions: drbd-0.7.15 (installed for each of the kernels) kernels: 2.6.14-1.1656_FC4smp 2.6.14-1.1656_FC4 2.6.14-1.1656_FC4.tdennistsmp [2] 2.6.14-1.1656_FC4.tdennist hardware: CPUs: Intel(R) Xeon(TM) CPU 1.50GHz ide hard drive with a 1GB partition for DRBD. drbd.conf: resource test0 { protocol C; startup { wfc-timeout 0; } disk { on-io-error panic; } net { timeout 20;# unit: 0.1 seconds connect-int 10;# unit: seconds ping-int 10;# unit: seconds ko-count 30; } syncer { rate 30M; #group takes the place of sync-group group 1; al-extents 257; } on d-2 { device /dev/drbd0; disk /dev/hda13; address 10.130.163.58:7788; #meta-disk /dev/hda12[0]; meta-disk internal; } on d-5 { device /dev/drbd0; disk /dev/hda11; address 10.130.163.61:7788; #meta-disk /dev/hda10[0]; meta-disk internal; } } [1] the 2.6.14-1.1656_FC4.tdennist kernels were created by rpm -ivh kernel-2.6.14-1.1656_FC4.src.rpm cd /to/your_rpm_build_tree/area/ #Apply the following patch to SPECS/kernel-2.6.spec ###begin patch --- kernel-2.6.spec.1656_FC4 2006-01-20 15:19:49.000000000 -0500 +++ kernel-2.6.spec 2006-01-20 15:25:43.000000000 -0500 @@ -804,3 +804,3 @@ # Decrease stack usage in block layer -%patch1790 -p1 +# %patch1790 -p1 ###end patch # then build the rpm rpm -bb SPECS/kernel-2.6.spec [2] http://lkml.org/lkml/2005/11/6/169 Todd Denniston wrote: > Chip Burke wrote: > >> Good call. It is FC4 2.6.14_1656 . Do I need to go back just one >> revision? >> Or is there a specific kernel where this problem popped up? >> <SNIP> > Looking at a diff of the two trees, out of the 42 files with changes, > the files I would put the highest chance of causing the problem to be: > drivers/block/ll_rw_blk.c <SNIP> > #The above change comes from a patch Neil Brown sent to > linux-kernel at vger.kernel.org > "Mon, 7 Nov 2005 11:16:48" > Subject: do_mount: reduce stack consumption > Signed-off-by: Neil Brown <neilb at cse.unsw.edu.au> > Signed-off-by: Neil Brown <neilb at suse.de> <SNIP> >> >> -----Original Message----- >> From: Anquijix Schiptara [mailto:anquijix at hotmail.com] Sent: Friday, >> January 20, 2006 10:34 AM >> To: cburke at innova-partners.com >> Subject: RE: [DRBD-user] mkfs hangs >> >> If you run FC4 with newest kernel, install an older version, reinstall >> drbd module, and all good... There is already a similar thread >> according to the newest FC4-Kernel. >> >> >>> From: "Chip Burke" <cburke at innova-partners.com> >>> Reply-To: cburke at innova-partners.com >>> To: <drbd-user at linbit.com> >>> Subject: [DRBD-user] mkfs hangs >>> Date: Fri, 20 Jan 2006 10:13:44 -0500 >>> >>> I am running 0.7.15 and I cannot seem to format a drive. When I go to >>> run >>> 'mkfs -j /dev/drbd0', mfks hangs while writing the inode tables at >>> the same >>> place every time. If I stop drbd and format the underlying device, it >>> works >>> fine, but the drbd device is not a happy camper. Any ideas as to what >>> the >>> issue may be? /dev/drbd0 is set to primary and syncs just fine with it's >>> slave, so everything seems okay. but the drive isn't much good with >>> out a >>> files system. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lock_drbd_internal_Synced URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20060123/1e42499a/attachment.txt>