[DRBD-user] use drbd on wan

Mia Lueng xiaozunvlg at gmail.com
Wed Sep 21 21:05:02 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Hi Sausin:

Thanks for ur advice.

Now we use the script to create LVM snapshot before each
synchronization. If the sync process is failed(primary is down or net
link is broken) , then recovery the secondary node from the previous
snapshot immediately.
But the lvm snapshot merge is only supported in rhel6,(we uset sles11
sp1).  Is there another way to merge the snapshot?

2011/9/21 Lionel Sausin <ls at numerigraphe.com>:
> [Sorry, seems like my first post didn't make it to the list: resending]
> Dear Mia Lueng,
> We've been in this very situation for 6 month so I think I can anwser
> most of your questions:
>> (...) we have considered  the following solution:  On secondary node,
>> run disconnect and connect command in sequence . Diconnect drbd0 ,
>> wait 5 minutes (or longer time), connect drbd0 again, and after sync
>> complete , disconnect drbd0 again.
> Good idea I think. We tried that and it proved much harder to have
> working riliably that I expected. Please let me tell you the story.
> We have been so far as to write a script to watch the level of the
> kernel's write cache, and automatically disconnect the resource when too
> many writes are waiting. It works and we can share it if you want, but
> read on.
> Eventually, we gave up this idea because the data in write cache was
> lost each time the primary crashed, which was unfortunately often.
> So we upgraded to 8.3.11, which has a congestion management feature to
> do even better : the secondary node falls behind when the link is too
> slow, and catches up when writes calm down. But there were problems too,
> like frequent full-syncs forced on us. Our setup is complex and probably
> it met a race condition or something.
> So by all means, if you have the money consider using drbd-proxy. It's a
> bit expensive but it's probably the best solution. It will give you a
> big RAM buffer to handle short periods of heavy writes, and should work
> great witht the congestion management. You can ask for a free trial.
> If you stick to your initial idea, I'd advice you disconnect most of the
> time, and connect during hours where write activity is lowest. This way
> you won't slow your primary down too much and you have essentially no
> risk of forcing a full-sync.
>> But we are wondering the follow
>> issues:
>> 1. How can we confirm that each sychorinzation after connecting is a
>> quick sync and not a full sync? And how to tune it to be that.
> Quick sync, no tuning needed. drbd is very smartly done.
> However be warned that the connect/disconnect code probably still has
> race conditions and I suppose you will trigger them eventually if you
> disconnect/reconnect every 5 minutes. We had some full syncs and several
> complete lockups.
> In our experience, the congestion management had full sync problems too
> when used without DRBD-proxy, but at least the nodes didn't hang.
>> 2. the secondary node disconnect drbd0 only when the cstate is
>> Connected and dstate is UptoDate. Can  the data integration of db data
>> be guaranteed?  In other words, If the primary node crash at this
>> moment,  the oracle db can be started on the secondary node?  And in
>> quich sync , Is the data written on secondary node  in the same order
>> as  it written on primary node ?
> No! During syncs, the secondary node *will be incoherent* by design.
> It's less of a problem if you make a snaphot of the secondary before the
> sync. Drbd is distributed with an example script to do that, which work
> great in v8.3.10+. The recovery may be complex but it can be done.
> Document the procedures and you'll be fine.
> And no, the blocks will *not be written in the same order*.
> If you really need that you must stay connected, with the right protocol
> and the right barriers and flushes configured, and it will by very slow.
> But really, few people need it.
> On a side note, drbd lets you run primary with an incoherent local disk
> if the secondary is reachable (reads will be fetch over the network).
> It's amazing and precious when you have system maintenance to do.
>> 3. is the activity log size tuning helpful for this situation?
> I had no time to try, but I guess it will. Why not build a prototype and
> experiment ?
> Last piece of advice: DRBD needs the available bandwidth to be reliable.
> You will have weird problems if you ask for 2MB/s and suddenly only
> 0,5MB/s is available because the line is busy with something else.
> Yours,
> Lionel Sausin.
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

More information about the drbd-user mailing list