Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Ben Clewett wrote: > I don't know whether I dare add to this, but I do understand MySql and > the way it writes data. If it helps the argument.... > > Using the InnoDB engine, it does write ahead, but only into a log file. > The actual writing of the index's and row data follows later. (The > data is held in memory, so no urgency to write to disk.) After a crash > similar to a machine having power removed, a crash recovery takes place > to write the missing data from the logs. This may be a slow process. I > know from painful experience that this is not 100% guaranteed, you have > the real risk of corrupting a table. > > If you use the MyIsam table, things are worse. This has no crash > recovery and you stand more of a chance of corrupting data, eg, a row > which has been written, but it's index which has not. Crash recovery is > planned in version 5.2 I belive. > > However, you can lock MySql for writing, request it flushes all data, > and then take a snapshot. Which is better than a full stop of the > system, if handled well users will hardly notice... For crash recovery to work, be it MySQL InnoDB, a serious DBMS, or just a journaling file system your system setup and hardware must be "fsync clean". That means that your system must guarantee that once fsync returns the data made it to the actual discs or at least to nonvolatile memory like the cache of a batter backed RAID controller. There are many things that can go wrong and the details (drives can cache, LVM does not support write barriers, etc) have been discussed countless times, so I won't repeat them. There are some tools floating around to test your system's behaviour on power failures, such as: http://www.faemalia.net/mysqlUtils/diskTest.pl A file system snapshot, a power failure or a the failover to a standby node using DRBD from a crashed out machine basically present the application with the same state and require that it is able to recover from that. File systems use journals for their meta data (some for data as well), database systems use write ahead logs which is basically the same. Both use special commands (fsync/fua/barriers) after writing to their journal / WAL to instruct the system to flush that data to nonvolatile storage. If your system lost data that it confirmed to be written to nonvolatile storage, recovery will break, that is expected. That problem is way wider than file system snapshots, another example: DRBD on top of a single SATA drive with write cache enabled: This is not safe, as DRBD, as well as LVM does not support barriers yet, so the file system has no way to instruct the underlying block device (the SATA drive) to flush its cache when needed. The journaling file system can/will fail to recover. Additional DRBD itself might get inconsistent as its meta data might no longer match the data of the underlying disc. If you don't trust file system snapshots, you can't trust DRBD as it requires the same your from the applications. Test and fix your system to behave as expected, in case if DRBD that means never run without a BBU controller and disable the write cache of all your drives. -- Best regards, H.D.