Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Aug 04, 2010 at 08:00:01PM +0200, Lars Ellenberg wrote: > > With protocol B this can lead to an situation where the secondary node > > becomes completely unusable. It looks like the secondary sends all IO > > requests to the LVM layer and LVM can not manage the queue after a > > certain point. > > Too bad. So you say this is a problem of LVM and nothing DRBD can do about, right? > > I would expect DRBD to use the flush method on LVM containers as > > default. At least if protocol B is used. > > With kernels >= 2.6.24, a "flush" is implemented as "empty barrier", > so if there is no barrier support, there will be no flush support > either (except for maybe very few special cases). Well, since flush works with LVM, this might be one of these cases. > > In my test setup with 10 DRBD resources the logger loop takes arround > > 50 seconds to finish on the primary. While the primary is working with > > load below 1, the secondary load raises up to 10 and stays there for a > > couple of minutes. With only 10 resources the secondary recovers after > > a while. > > If you try the same simple test with 30 or more DRBD resources the > > secondary will get a load of 40 and wont recover, at least not within > > an hour. > > ;-) > > If they are hurting you, disable barriers, then. We will do that. What I still dont understand is why only the secondary has this trouble. I thought DRBD would use the same disk settings for the primary, that would mean the primary DRBD would use barriers and the underlying LVM would get into the same troublesome situation. But this is not the case. What is the difference here? > > > With flush or protocol C it takes a couple of minutes to finish syncing > > these 400 messages per resource and the secondary remains usable. > > Why this must take sooo long is an other question...