[DRBD-user] Consistent device to primary fences remote node

Lars Ellenberg lars.ellenberg at linbit.com
Fri Nov 28 12:05:00 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Nov 27, 2008 at 06:03:56PM +0100, Federico Simoncelli wrote:
> On Thu, Nov 27, 2008 at 5:38 PM, Lars Ellenberg
> <lars.ellenberg at linbit.com> wrote:
> >> 1) both servers: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate
> >> 2) server 2 is correctly shut down
> >> 3) server 1: cs:WFConnection st:Primary/Unknown ds:UpToDate/Outdated
> >> 4) booting server 2 in StandAlone mode is impossible since it has Outdated data
> >
> > careful. you use your own outdate-peer handler.
> > so, server-1 "knows".
> > but does server-2 know that it is outdated?
> > who outdated it?
> > when?
> 
> When you correctly shut down the server 2 the drbd service is cleanly
> stopped. Running "drbdadm show-gi" shows that "Data was/is currently
> up-to-date" is set to 0. This means that the resource is automatically
> Outdated.

is that so.
hm.
/me looks up the state handling code...
right you are.

if we are Connected, and a node voluntarily leaves the cluster,
and a drbd fencing policy is configured, and the other node is Primary,
then the leaving node is requested to outdate itself prior to disconnect.

> If then I shut down also the server 1 the drbd service is cleanly
> stopped and running "drbdadm show-gi" shows that "Data was/is
> currently up-to-date" is set to 1. So the server 1 is left to
> UpToDate.
> This should also answer your following question:
> 
> >> Basically the Consistent status will always be turned into Outdated
> >> because I have no way to check if the remote peer is primary. I assume
> >> that a peer with "Consistent" status was incorrectly shut down and
> >> can't become primary without manual intervention.
> >
> > you can now no longer reboot a single primary,
> > whether cleanly or by power reset.
> 
> Yes, I can. The first node that is cleanly shut down is Outdated, the
> last one is UpToDate (since it holds the last updated data).
> I can boot as StandAlone the one with the UpToDate resource.

ok. at the same time the leaving node above was outdated,
the still running Primary stored the "peer-is-outdated" flag.

which allows it to skip the outdate-peer step on re-attach,
and go automatically from Consistent to UpToDate.

> I just can't boot a single primary if both nodes were incorrectly shut
> down.

or if you had a network hickup first.
but, right, you get a fencing race on network hickup.
nice.

> Manual intervention is needed to decide which one is the most
> updated.
> This is pretty reasonable.

ok. might work.

> >> I don't like the idea of a server waiting for a couple of days in the
> >> boot sequence as a general rule and in this particular situation even
> >> more since I moved the drbd script early at the beginning  before
> >> clvmd.
> >
> > of course network and sshd have to be up first.
> 
> Network is obviously up. I don't need ssh to be up since my
> outdate-peer handler doesn't need it.
>
> It just need cman in case it needs to fence the remote peer. I need
> clvmd to start after drbd to detect LVM.
> Basically I'm trying to boot drbd as soon as an iscsi/aoe device would do.

_you_ need the SSHD to remotely administer the box.
or a network kvm. or some such.

we recommend to get the means to remotely administer the box up
as soon as possible, before any services.
we usually have at least a serial console hooked up,
and a getty running on that
(I think on some boxes that getty is actually a memlocked, realtime,
ulimitted, static busybox, to be able to get certain fork-bomb like
database client behaviour under control ;->)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list