[DRBD-user] Disk Corruption = DRBD Failure?

Florian Haas florian at hastexo.com
Wed Oct 12 09:04:40 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 2011-10-11 17:09, Charles Kozler wrote:
> Hi,
> 
> I have been reading the docs and still seem to be unclear as to some things-
> 
> Assume I have a two node setup with DRBD in Primary/Primary with Xen
> writing to /dev/drbd0 on node1. I use Primary/Primary for live migration
> and in my Xen DomU configuration file I use phy: and not drbd: handler.
> 
> Now, what happens if the disk on node1 begins to fail and the blocks
> where /dev/drbd0 resides are corrupted while we continue to write to
> this- will these bad/corrupted blocks be replicated to node2?

If the underlying _disk_ fails in weird ways and that is why you get
corruption, then the corruption occurs below the DRBD level and there's
no corruption for DRBD to replicate.

If however you have one of your Xen domUs writing garbage to that device
(so the corruption occurs in a layer above DRBD), then of course DRBD
will happily replicate that corruption.

> Example aside, in short, I am wondering if a failing disk on a node will
> result in DRBD replicating bad block data to the secondary node.  I know
> there a place in the docs describing integrity checker using the kernels
> crypt algo's (like md5) so maybe thats an option to prevent it?

Nope, that will only prevent corruption that may occur *within* DRBD due
to a fishy network layer, or bit flips on your PCI bus, or broken
checksum offloading on your NICs.

For preventing corruption in the disk I/O layer, DRBD would have to
support DIF/DIX, which it currently doesn't do (very few applications do).

> In either case, is there any way to prevent bad block data from node 1
> being replicated to node 2?

Corruption rooted in the network stack, yes -- use data-integrity-alg.

Corruption rooted in your Xen domU, nope.

For corruption rooted in the I/O layer, you can't prevent the
replication from happening but you can detect the corruption after the
fact -- use verify-alg and run device verification.

Hope this helps,
Florian

-- 
Need help with DRBD?
http://www.hastexo.com/now



More information about the drbd-user mailing list