Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
--- Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > On Fri, May 25, 2007 at 12:41:15AM -0700, John Frisk > wrote: > > Team, > > I have been working with the NFS folks for a bit > now > > and am experiencing something strange which causes > NFS > > clients to hang when attempting to use > > heartbeat/drbd/knfsd as a HA NFS server. I don't > > believe this is a NFS issue any longer. > > > > Setup: > > Machine A: primary drbd and HA machine (SMP > machine) > > Machine B: secondary drbd and HA machine (UP > machine) > > Machine C: NFS client using bonnie++ as a test for > I/O > > activity ( -f -s 100 -n 1 -r 0 are the parameters > ) > > All machines are Debian 4.0 etch with vanilla > > 2.6.22-rc2 kernel + known NFS issues patched > > > So I have been asking myself "What is the > difference > > between machine A and B that would cause the issue > > only on machine A". The network adapters on both > A & > > B are realtek r8169 style adapters. The biggest > > difference I can think of is machine A is an SMP > > machine Athlon 64 X2 (running on a i386 kernel due > to > > support issues with the 64bit kernel) while > machine B > > is an older Athlon slot A processor. > > lower level io-subsystem differs? True, machine A is SATA and machine B is IDE, but should that matter? > available RAM differs? Both machines have the same amount the physical RAM being 1 GB. The usage is about the same since they are running the same programs. They were recently built to be as close to identical as possible. > > I have attached a triggered sysrq from machine > > not useful here. OK, the NFS folks tend to use kernel dumps extensively. I didn't know what was preferred, but I have a summary from Trond Myklebust from the NFS team: "To be more precise, it looks from the sysrq there as if the NFS hang is due to rpc.mountd getting stuck in drbd_make_request..." > look at /proc/drbd, > and tell us what is there, when your nfs-client > hangs. Here's the output from Machine A during a hang: version: 8.0.3 (api:86/proto:86) SVN Revision: 2881 build by frisk at inferno, 2007-05-24 23:27:32 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r--- ns:3581188 nr:78482644 dw:82063832 dr:2911 al:250 bm:4813 lo:0 pe:0 ua:0 ap:0 resync: used:0/31 hits:4879340 misses:4770 starving:0 dirty:0 changed:4770 act_log: used:0/257 hits:895047 misses:398 starving:0 dirty:148 changed:250 Here's the output from Machine B during a hang: version: 8.0.3 (api:86/proto:86) SVN Revision: 2881 build by frisk at saint, 2007-05-24 23:30:12 0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r--- ns:78482644 nr:3581200 dw:3918108 dr:78147366 al:73 bm:4807 lo:0 pe:0 ua:0 ap:0 resync: used:0/31 hits:4879340 misses:4770 starving:0 dirty:0 changed:4770 act_log: used:0/257 hits:84154 misses:73 starving:0 dirty:0 changed:73 > does generating local io > (or "sync" or "emergency sync" via sysrq) > on the Primary help? > on the Secondary? Test 1) Running a sync does nothing on either the primary or secondary. Test 2) Running a sysrq sync also does not do anything. > maybe you can also watch with wireshark and find > some > "interessting" behaviour (please, I'm not > interessted > in any unprocessed dump files; this is just a > suggestion > for a tool you might use to further nail down what > happens). I would be happy to install wireshark and capture packets. Any help on what I'm looking for? Should I only look for drbd traffic on ports 7788 or NFS too? Thank you for your help. ____________________________________________________________________________________You snooze, you lose. Get messages ASAP with AutoCheck in the all-new Yahoo! Mail Beta. http://advision.webevents.yahoo.com/mailbeta/newmail_html.html