[DRBD-user] Replication problems constants with DRBD 8.3.10

Lars Ellenberg lars.ellenberg at linbit.com
Mon Jul 1 10:40:36 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


See below.

On Fri, Jun 28, 2013 at 08:34:50PM -0700, cesar wrote:
> Hi guys
> 
> can anybody help me ????
> I will be very grateful if anyone can help me
> 
> *Mr. Lars Ellenberg tell me that:*
> "With special purpose built fencing handlers,
> we may be able to fix your setup so it will freeze IO during the
> disconnected period, reconnect, and replay pending buffers,
> without any reset."
> 
> Note.- I believe that the problem is my NIC Realtek, as soon as I can i will
> change it for Intel,


On Fri, Jun 28, 2013 at 09:41:20PM -0700, cesar wrote:
> Hi Kaloyan Kovachev
> 
> I would humbly ask for your help.
> I will be very grateful if you can help me 
> 
> I have Proxmox VE for HA of VMs (based on KVM) + LVM2 + DRBD 8.4.2. (NIC to
> NIC)
> And have 1 VM in HA that uses 2 DRBD resources, ie for the VM  are 2 virtual
> disks.
> 
> My problem is that i lost connection with DRBD:
> 
> version: 8.4.2 (api:1/proto:86-101)
> GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root at kvm5,
> 2013-06-16 13:44:51
>  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:12443007 nr:0 dw:46472892 dr:1719821020 al:3096 bm:1408 lo:1 pe:0
> ua:0 ap:1 ep:1 wo:f oos:3319816
>  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:4764396 nr:0 dw:3791840 dr:1057033682 al:1164 bm:336 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:f oos:1783620
> 
> I think that may be for 2 causes:
> Cause 1: The NIC is Realtek, as soon as I can i will change it for Intel, 
> Cause 2: I use the directive net data-integrity-alg md5; (I would not want
> remove it) that can have problems with my PC ASUS P8H77-M PRO , also I don't
> use raid.
> 
> *And Mr. Lars Ellenberg tell me that:*
> "With special purpose built fencing handlers,
> we may be able to fix your setup so it will freeze IO during the
> disconnected period, reconnect, and replay pending buffers,
> without any reset."


If you quote me in some other thread out of context,
at least give a pointer to the original thread:

http://www.gossamer-threads.com/lists/drbd/users/25466#25466

You'll notice that the sentence you chose to quote
was an *anti* recommendation.

What I told you was that

 * it is likely NOT hardware,
   but upper layers (users of DRBD) modifying buffers
   while they are in flight.

   (given your exact symptoms, if it was bad hardware,
   it was bad RAM, not bad nic)

 * you have to fix these upper layers

 * you have to implement fencing

 * you have to upgrade your DRBD

 * you likely should not use dual-primary

 * you should understand what you are doing.

Back to the sentence that you chose to quote,
you would have to write (program) those "special purpose built fencing
handlers" first, and they would only make any sense *after*
you implemented fencing (and upgraded your DRBD).
And even then, it would be a hack, a bandaid, only to paper over
the real problem.  So again: this is NOT what I recommend you to do.

Thank you,

	Lars




More information about the drbd-user mailing list