[DRBD-user] Snapshot with DRBD

Fri Feb 15 14:22:09 CET 2008

Ben Clewett wrote:
> I don't know whether I dare add to this, but I do understand MySql and 
> the way it writes data.  If it helps the argument....
> 
> Using the InnoDB engine, it does write ahead, but only into a log file. 
>  The actual writing of the index's and row data follows later.  (The 
> data is held in memory, so no urgency to write to disk.)  After a crash 
> similar to a machine having power removed, a crash recovery takes place 
> to write the missing data from the logs.  This may be a slow process.  I 
> know from painful experience that this is not 100% guaranteed, you have 
> the real risk of corrupting a table.
> 
> If you use the MyIsam table, things are worse.  This has no crash 
> recovery and you stand more of a chance of corrupting data, eg, a row 
> which has been written, but it's index which has not.  Crash recovery is 
> planned in version 5.2 I belive.
> 
> However, you can lock MySql for writing, request it flushes all data, 
> and then take a snapshot.  Which is better than a full stop of the 
> system, if handled well users will hardly notice...

For crash recovery to work, be it MySQL InnoDB, a serious DBMS, or just 
a journaling file system your system setup and hardware must be "fsync 
clean". That means that your system must guarantee that once fsync 
returns the data made it to the actual discs or at least to nonvolatile 
memory like the cache of a batter backed RAID controller. There are many 
things that can go wrong and the details (drives can cache, LVM does not 
support write barriers, etc) have been discussed countless times, so I 
won't repeat them. There are some tools floating around to test your 
system's behaviour on power failures, such as: 
http://www.faemalia.net/mysqlUtils/diskTest.pl

A file system snapshot, a power failure or a the failover to a standby 
node using DRBD from a crashed out machine basically present the 
application with the same state and require that it is able to recover 
from that. File systems use journals for their meta data (some for data 
as well), database systems use write ahead logs which is basically the 
same. Both use special commands (fsync/fua/barriers) after writing to 
their journal / WAL to instruct the system to flush that data to 
nonvolatile storage. If your system lost data that it confirmed to be 
written to nonvolatile storage, recovery will break, that is expected. 
That problem is way wider than file system snapshots, another example:

DRBD on top of a single SATA drive with write cache enabled:

This is not safe, as DRBD, as well as LVM does not support barriers yet, 
so the file system has no way to instruct the underlying block device 
(the SATA drive) to flush its cache when needed. The journaling file 
system can/will fail to recover. Additional DRBD itself might get 
inconsistent as its meta data might no longer match the data of the 
underlying disc.

If you don't trust file system snapshots, you can't trust DRBD as it 
requires the same your from the applications.

Test and fix your system to behave as expected, in case if DRBD that 
means never run without a BBU controller and disable the write cache of 
all your drives.

-- 
Best regards,
H.D.