<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
On 09/08/2010 03:49 PM, Lars Ellenberg wrote:
<blockquote cite="mid:20100908134954.GE4111@barkeeper1-xen.linbit"
type="cite">
<pre wrap="">On Thu, Sep 02, 2010 at 03:22:25PM +0200, Robert Verspuy wrote:
</pre>
<blockquote type="cite"><br>
<pre wrap="">On the database server we're using PostgreSQL.
PostgreSQL is ACID-compliant, so the data on disk should not be corrupt.
It could be possible that we lost some database insert/updates,
but that's a risk I'm willing to accept, looking at the small change
that all power is lost.
</pre>
</blockquote>
<pre wrap="">
Excuse me, but WHAT?
PostgreSQL is ACID compliant, IF AND ONLY IF the fsync/fdatasync and
similar it issues are behaving as expected, i.e. data is on stable
storage when PostgreSQL thinks it is.
</pre>
</blockquote>
Hmm. Yes you are right. I think I was a bit too fast in thinking,
everything will be fine.<br>
I though that no-disk-flushes would make drbd to not add it's own
flushes after every IO,<br>
but still accept and push through the flushes that came from the
layer above the drbd device.<br>
<br>
But, as I understand, drbd will not do any flushes when
no-disk-flush is set. Not it's own flushes, and also not the flush
requests it gets from the layer above.<br>
<blockquote cite="mid:20100908134954.GE4111@barkeeper1-xen.linbit"
type="cite">
<pre wrap="">If data only reaches stable storage at some point after PostgreSQL
thinks it already was there, and most likely even in some random order,
then no, ACID compliance is not met.
</pre>
</blockquote>
Ok, together with your other mail, I think I understand it now.<br>
<br>
So, I think there are two risks when using volatile caches with
no-disk-barrier and no-disk-flushes and protocol C.<br>
<br>
First -> single node failure, there can be difference in what is
actually on disk.<br>
After recovery, let the crashed node be the secondary and run a
verify as soon as possible.<br>
If the now primary node crashes before the verify is done, you'll
must restore the database from a backup.<br>
<br>
Second -> both nodes have a crash / power failure. This way, it's
possible both nodes have corrupt data.<br>
Solution: restore a backup of the database.<br>
<br>
So in any case (just like when running postgresql on one server),
your data loss is always limited to your last regular backup of the
database.<br>
<br>
The reason for me to test with no-disk-barrier and no-disk-flushes
is because of the big latency (25ms in stead of the expected 1 or 2
ms) when writing small blocks of data.<br>
(See also my e-mail from last week, asking directions where to start
looking to find the what's causing the latency)<br>
<blockquote cite="mid:20100908134954.GE4111@barkeeper1-xen.linbit"
type="cite">
<pre wrap="">
So no, if you run PostgreSQL on disks with volatile caches,
and you unplug the power hard, you can expect data loss
and possibly data corruption.
Which is completely independend of DRBD.
</pre>
</blockquote>
True,<br>
<br>
So when comparing:<br>
<br>
postgresql on one server, with it's own disk flushes and volatile
caches<br>
against<br>
postgresql on two nodes with drbd, with no-disk-barrier,
no-disk-flushed and volatile caches,<br>
<br>
then it's (looking at data loss / corruption) it's safer to run
postgresql on one server, because of the disk flushes.<br>
<br>
Unless we find the cause and maybe a solution for the huge latency,<br>
then I can remove the no-disk-barrier, no-disk-flushes parameters.<br>
<br>
With kind regards,<br>
Robert Verspuy<br>
<br>
<div class="moz-signature">-- <br>
<b>Exa-Omicron</b><br>
Patroonsweg 10<br>
3892 DB Zeewolde<br>
Tel.: 088-OMICRON (66 427 66)<br>
<a class="moz-txt-link-freetext" href="http://www.exa-omicron.nl">http://www.exa-omicron.nl</a></div>
</body>
</html>