Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
2017-10-30 11:53 GMT+01:00 Robert Altnoeder <robert.altnoeder at linbit.com>: > Drbdmanage just does not compensate for LVM's shortcomings. > (…) > This is also the reason that the codebase of the product that will > replace drbdmanage in the future is already multiple times the size of > the current drbdmanage, although it is still in an early stage of its > development and just barely starting to do anything useful. One cause > for this increase in size is that even in the experimental version that > we have right now, the 55 lines of code that attempt LVM volume creation > are backed by about 2000 lines of error detection, error correction and > error reporting code. To be honest the fact you're writing a system that will try to deal with lvm on its own sounds very encouraging to me. > We could try to provide a product that deals with as many of the > potential problems as anyone can think of, but since someone obviously > has to do all the work, the question is: How much would you be willing > to pay for it? > > (…) > >> (Btw: are there any other 5.4.1s a new user should be aware of?) > Thousands probably, depending on the exact configuration. > > - Thin provisioning might lock up your system if you run out of space. > - DRBD meta data may need to clean up slots if you have used them all > and then replace a node with another node that has a different node ID. > - LVM may become extremely slow if it is not configured to ignore DRBD > devices and there are lots of DRBD devices that cannot be opened, e.g. > because there is another Primary. > - etc. ... > > Apparently, most people have not ever hit the problem you describe. I > did not ever see it come up in my test environment. Some others have hit > other problems that you did not encounter. I understand that writing tested software takes time and money. Having a not-so-complicated version as the first attempt is the reasonable choice. However I don't think I can explain to you how frustrating it was to find out that something I've spent so much time dealing with was a well known issue, just not documented properly. The way I see it is this: running lvmcreate by drbdmanage will lock a system under known conditions. They aren't abnormal conditions. Simply having the bad luck of running lvmcreate against disk space that was previously utilized (not an uncommon occurrence on a sufficiently large/old system) will kill said system. I was lucky enough to hit the problem on 'drbdmanage init'. There will be people less lucky then me that get this after two years in production. Avoiding this very low probability russian roulette is very easy, as long as an admin is made aware of it. May I suggest putting all this stuff under a single section of the documentation and then: - mentioning it really early in the "5. Common administrative tasks - DRBD Manage" that if you don't take a look at this section and configure your system accordingly, you're asking for trouble - putting a link at the top of "8. Troubleshooting and error recovery" - having 'drbdmanage init' mention it. It's quite chatty already, no reason not to have it nag the admin to go through a system config checklist. Not a single line of code needs to be written (well, almost) and a lot of admin hours might get saved. Hell, I'll do it if the docs have sources somewhere public and you confirm that's an acceptable change for you.