[DRBD-user] write performance issues

Mon Jul 11 23:07:43 CEST 2011

On 7/11/11 4:21 PM, Phil Stoneman wrote:
> On 11/07/11 15:13, Mark Dokter wrote:
>> On 07/10/2011 07:30 PM, Phil Stoneman wrote:
>>> I've seen a similar thing with small writes, and for me, using
>>> no-disk-barrier and no-disk-flushes solved the small write performance
>>> issue. Hope that helps! :-)
>>
>>  From [1]:
>> "Unfortunately device mapper (LVM) might not support barriers."
>>
>> Although, cat /proc/drbd states
>> 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
>>      ns:0 nr:46983440 dw:46983440 dr:0 al:0 bm:2863 lo:0 pe:0 ua:0 ap:0
>> ep:1 wo:b oos:0
> 
> I think that LVM that comes with recent kernels support barriers, which
> is why you're seeing this.
> 

How recent? I'm using the 2.6.32.41 xen-stable git tree from
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git

> 
>> Furthermore, my 3ware RAID controllers do not have a BBU installed and I
>> read that it's not recommended to use no-disk-barriers  and
>> no-disk-flushes in this case. The servers are connected to a quite large
>> ups tough. Does that suffice?
> 
> From what I understand, the main risk of not using barriers is that when
> drbd thinks something's written to disk, it might not actually be. And
> that's possibly not great - but to me, it doesn't seem any worse than
> using a SATA disk normally with the internal disk write ccache.
> Anyway, if you have a UPS which is configured to gracefully shut your
> machines down on power loss, it should have approximately the same
> effect as having a BBU.
> 

Unfortunately, the UPS is powering the whole facility and there's no
communication mechanism to shut down gracefully.
Furthermore, my two xen servers (at least I observed that behaviour on
one of them) tend to reboot *sometimes* when there's a total loss of
communication with the other server. I've already read some forum
posts/mails of people who have the same problem. There seems to be some
weird bug with xen and drbd. I didn't have time to investigate that
issue yet.

> 
> Or, to put it another way: I have good and regularly tested backups; the
> small risk of losing a bit of data on writes is more than offset by the
> massive speed benefit that no-disk-barrier and no-disk-flushes gives me.
> 

We've got a regular backup on tape, but that doesn't cover system
partitions of the virtual machines. So the issue is the downtime, not
the loss of data, if a partition gets corrupted :(

> 
> You can see my previous discussion on this here:
> 
> http://article.gmane.org/gmane.comp.linux.drbd/21997
> http://article.gmane.org/gmane.comp.linux.drbd/22056
> 
> Phil

Mark