Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Nov 27, 2008 at 06:03:56PM +0100, Federico Simoncelli wrote: > On Thu, Nov 27, 2008 at 5:38 PM, Lars Ellenberg > <lars.ellenberg at linbit.com> wrote: > >> 1) both servers: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate > >> 2) server 2 is correctly shut down > >> 3) server 1: cs:WFConnection st:Primary/Unknown ds:UpToDate/Outdated > >> 4) booting server 2 in StandAlone mode is impossible since it has Outdated data > > > > careful. you use your own outdate-peer handler. > > so, server-1 "knows". > > but does server-2 know that it is outdated? > > who outdated it? > > when? > > When you correctly shut down the server 2 the drbd service is cleanly > stopped. Running "drbdadm show-gi" shows that "Data was/is currently > up-to-date" is set to 0. This means that the resource is automatically > Outdated. is that so. hm. /me looks up the state handling code... right you are. if we are Connected, and a node voluntarily leaves the cluster, and a drbd fencing policy is configured, and the other node is Primary, then the leaving node is requested to outdate itself prior to disconnect. > If then I shut down also the server 1 the drbd service is cleanly > stopped and running "drbdadm show-gi" shows that "Data was/is > currently up-to-date" is set to 1. So the server 1 is left to > UpToDate. > This should also answer your following question: > > >> Basically the Consistent status will always be turned into Outdated > >> because I have no way to check if the remote peer is primary. I assume > >> that a peer with "Consistent" status was incorrectly shut down and > >> can't become primary without manual intervention. > > > > you can now no longer reboot a single primary, > > whether cleanly or by power reset. > > Yes, I can. The first node that is cleanly shut down is Outdated, the > last one is UpToDate (since it holds the last updated data). > I can boot as StandAlone the one with the UpToDate resource. ok. at the same time the leaving node above was outdated, the still running Primary stored the "peer-is-outdated" flag. which allows it to skip the outdate-peer step on re-attach, and go automatically from Consistent to UpToDate. > I just can't boot a single primary if both nodes were incorrectly shut > down. or if you had a network hickup first. but, right, you get a fencing race on network hickup. nice. > Manual intervention is needed to decide which one is the most > updated. > This is pretty reasonable. ok. might work. > >> I don't like the idea of a server waiting for a couple of days in the > >> boot sequence as a general rule and in this particular situation even > >> more since I moved the drbd script early at the beginning before > >> clvmd. > > > > of course network and sshd have to be up first. > > Network is obviously up. I don't need ssh to be up since my > outdate-peer handler doesn't need it. > > It just need cman in case it needs to fence the remote peer. I need > clvmd to start after drbd to detect LVM. > Basically I'm trying to boot drbd as soon as an iscsi/aoe device would do. _you_ need the SSHD to remotely administer the box. or a network kvm. or some such. we recommend to get the means to remotely administer the box up as soon as possible, before any services. we usually have at least a serial console hooked up, and a getty running on that (I think on some boxes that getty is actually a memlocked, realtime, ulimitted, static busybox, to be able to get certain fork-bomb like database client behaviour under control ;->) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed