Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 06/11/2012 04:31 PM, Florian Haas wrote: > On 06/11/12 22:14, Matthias Hensler wrote: >> On Mon, Jun 11, 2012 at 06:35:18PM +0200, Matthias Hensler wrote: >>> [...] >>> I checked the changelog for 8.3.12, but nothing obviously struck me. >>> Also diffing the sourcetrees 8.3.11->8.3.12 I did not find any >>> obvious. >> >> Let me follow up on this myself. As suggested on IRC I tried to build >> drbd from source, just to take the elrepo packages from the equation. >> >> So I started with DRBD 8.3.13, and as expected I had a low performance. >> >> Then I tried 8.3.11, and I also had a low performance (although 8.3.11 >> from elrepo worked fine). >> >> That left me puzzled for a while, since I examined the elrepo packages >> more closely. As it seemed, all working drbd versions where build on >> 2.6.32-71, while all broken versions where build on 2.6.32-220. >> >> >> So, I installed the old el6 2.6.32-71 kernel (took me a while to find >> it, since it was removed from nearly all archives) and its devel >> package, booted into that kernel and build two new versions from source: >> 8.3.11 and 8.3.13. Then I booted back to 2.6.32-220. >> >> First try with my selfcompiled 8.3.11 modules: everything is fine. >> Second try with my selfcompiled 8.3.13 modules: still everything is >> fine. >> >> Indeed, the problem lies within the kernel version used to build the >> drbd.ko module. I double checked by using all userland tools from 8.3.13 >> elrepo build together with my drbd.ko build on 2.6.32-71 (but run from >> 2.6.32-220). >> >> Just to be clear: all tests were made with kernel 2.6.32-220, and the >> userland version does not matter. >> >> drbd.ko | 8.3.11 | 8.3.13 >> ---------------------+--------+------- >> build on 2.6.32-71 | good | good >> build on 2.6.32-220 | bad | bad >> >> >> So, how to debug this further? I would suspect looking at the symbols of >> both modules might give a clue? > > As a knee-jerk response based on a hunch -- you've been warned :) --, > this could be related to the BIO_RW_BARRIER vs. FLUSH/FUA dance that the > RHEL 6 kernel has been doing between the initial RHEL 6 release, and > more recent updates (when they've been backporting the "let's kill > barriers" upstream changes from post-2.6.32). > > Try configuring your disk section with no-disk-barrier, no-disk-flushes > and no-md-flushes (in both configurations) and see if your kernel module > change still makes a difference. > > Of course, in production you should only use those options if you have > no volatile caches involved in the I/O path. > > Not sure if this is useful, but I sure hope it is. :) > > Cheers, > Florian > Oh! Please let me know if this works. :) digimer -- Digimer Papers and Projects: https://alteeve.com