On Mon, Jan 3, 2011 at 11:44 AM, Chris Worley <span dir="ltr"><<a href="mailto:worleys@gmail.com">worleys@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Mon, Jan 3, 2011 at 1:22 AM, Felix Frank <<a href="mailto:ff@mpexnet.de">ff@mpexnet.de</a>> wrote:<br>
>>> <snip><br>>> This is part of Unix file system semantics.<br>
>><br>
>> A dead system is not the proper outcome.<br>
><br>
> Color me ignorant, but what have *file system* semantics got to do with<br>
> a block device?<br>
<br>
</div>Exactly. It should have nothing to do with it, but it's causing<br>
lock-outs in the kernel (I'm not sure if that's the proper<br>
terminology, but the output is in previous incarnations of this<br>
thread) that cause spinning media not under DRBD control (i.e. the<br>
root fs) to disconnect (then reconnect as other SD devices... but that<br>
does the previous mounts, i.e. "/", no good).</blockquote><div><br></div><div> So, logically, it could be the block-layer itself. It could also be a race condition in SDP maybe.</div><div><br></div><div>I would suggest:</div>
<div>1) Trying newer/different Kernel.</div><div>2) Using regular old Ethernet instead of IB to eliminate SDP as a factor.</div><div><br></div><div>Do you have any kernel messages besides the 120 second hung task warning?</div>
<div><br></div><div>-JR</div></div>