[DRBD-user] Disk Corruption = DRBD Failure?

Charles Kozler charles at fixflyer.com
Wed Oct 12 20:00:30 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

This was 100% spot on the answer I was looking for- thanks guys!

Also, do any white papers exist on how DRBD works on the inside? From 
what you told me it looks like its

Write to DRBD Block Device -> Write to TCP buffer -> Write to host disks

I thought it was

Write DRBD Block Device -> Write to disk -> Write to TCP Buffer -> Write 
to host disks (like a push method almost)

Which is why I wanted to know about disk corruption but from what it 
seems like is that I should be more concerned about corruption in the 
network stack, right?

Chuck Kozler
/Lead Infrastructure & Systems Administrator/
*Office*: 1-646-290-6267 | *Mobile*: 1-646-385-3684
FIX Flyer

Notice to Recipient: This e-mail is meant only for the intended 
recipient(s) of the transmission, and contains confidential information 
which is proprietary
to FIX Flyer LLC. Any unauthorized use, copying, distribution, or 
dissemination is strictly prohibited. All rights to this information is 
reserved by FIX Flyer LLC.
If you are not the intended recipient, please contact the sender by 
reply e-mail and please delete this e-mail from your system and destroy 
any copies

On 10/12/2011 3:04 AM, Florian Haas wrote:
> On 2011-10-11 17:09, Charles Kozler wrote:
>> Hi,
>> I have been reading the docs and still seem to be unclear as to some things-
>> Assume I have a two node setup with DRBD in Primary/Primary with Xen
>> writing to /dev/drbd0 on node1. I use Primary/Primary for live migration
>> and in my Xen DomU configuration file I use phy: and not drbd: handler.
>> Now, what happens if the disk on node1 begins to fail and the blocks
>> where /dev/drbd0 resides are corrupted while we continue to write to
>> this- will these bad/corrupted blocks be replicated to node2?
> If the underlying _disk_ fails in weird ways and that is why you get
> corruption, then the corruption occurs below the DRBD level and there's
> no corruption for DRBD to replicate.
> If however you have one of your Xen domUs writing garbage to that device
> (so the corruption occurs in a layer above DRBD), then of course DRBD
> will happily replicate that corruption.
>> Example aside, in short, I am wondering if a failing disk on a node will
>> result in DRBD replicating bad block data to the secondary node.  I know
>> there a place in the docs describing integrity checker using the kernels
>> crypt algo's (like md5) so maybe thats an option to prevent it?
> Nope, that will only prevent corruption that may occur *within* DRBD due
> to a fishy network layer, or bit flips on your PCI bus, or broken
> checksum offloading on your NICs.
> For preventing corruption in the disk I/O layer, DRBD would have to
> support DIF/DIX, which it currently doesn't do (very few applications do).
>> In either case, is there any way to prevent bad block data from node 1
>> being replicated to node 2?
> Corruption rooted in the network stack, yes -- use data-integrity-alg.
> Corruption rooted in your Xen domU, nope.
> For corruption rooted in the I/O layer, you can't prevent the
> replication from happening but you can detect the corruption after the
> fact -- use verify-alg and run device verification.
> Hope this helps,
> Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20111012/70ae42ef/attachment.htm>

More information about the drbd-user mailing list