Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 06/11/12 22:14, Matthias Hensler wrote: > On Mon, Jun 11, 2012 at 06:35:18PM +0200, Matthias Hensler wrote: >> [...] >> I checked the changelog for 8.3.12, but nothing obviously struck me. >> Also diffing the sourcetrees 8.3.11->8.3.12 I did not find any >> obvious. > > Let me follow up on this myself. As suggested on IRC I tried to build > drbd from source, just to take the elrepo packages from the equation. > > So I started with DRBD 8.3.13, and as expected I had a low performance. > > Then I tried 8.3.11, and I also had a low performance (although 8.3.11 > from elrepo worked fine). > > That left me puzzled for a while, since I examined the elrepo packages > more closely. As it seemed, all working drbd versions where build on > 2.6.32-71, while all broken versions where build on 2.6.32-220. > > > So, I installed the old el6 2.6.32-71 kernel (took me a while to find > it, since it was removed from nearly all archives) and its devel > package, booted into that kernel and build two new versions from source: > 8.3.11 and 8.3.13. Then I booted back to 2.6.32-220. > > First try with my selfcompiled 8.3.11 modules: everything is fine. > Second try with my selfcompiled 8.3.13 modules: still everything is > fine. > > Indeed, the problem lies within the kernel version used to build the > drbd.ko module. I double checked by using all userland tools from 8.3.13 > elrepo build together with my drbd.ko build on 2.6.32-71 (but run from > 2.6.32-220). > > Just to be clear: all tests were made with kernel 2.6.32-220, and the > userland version does not matter. > > drbd.ko | 8.3.11 | 8.3.13 > ---------------------+--------+------- > build on 2.6.32-71 | good | good > build on 2.6.32-220 | bad | bad > > > So, how to debug this further? I would suspect looking at the symbols of > both modules might give a clue? As a knee-jerk response based on a hunch -- you've been warned :) --, this could be related to the BIO_RW_BARRIER vs. FLUSH/FUA dance that the RHEL 6 kernel has been doing between the initial RHEL 6 release, and more recent updates (when they've been backporting the "let's kill barriers" upstream changes from post-2.6.32). Try configuring your disk section with no-disk-barrier, no-disk-flushes and no-md-flushes (in both configurations) and see if your kernel module change still makes a difference. Of course, in production you should only use those options if you have no volatile caches involved in the I/O path. Not sure if this is useful, but I sure hope it is. :) Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now