Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 1-2-2012 16:56, Martin Gerhard Loschwitz wrote: >> Martin, >> > >> I'm actually running a dual-active Samba server with a shared GFS2 >> file-system (block-replicated by DR:BD). I'm also (ab)using it for an >> Apache/Tomcat installation with session replication in Tomcat through a >> shared file-system. >> > >> Of course, all this is balanced using RR-DNS, and when one node fails, >> the cluster resource (IP address) is taken over by the surviving node to >> re-establish service (at somewhat lower performance). >> > >> Did a kitten just die? >> > >> Robert Campbell >> > Robert, > > what sort of STONITH do you use for this setup? How did the system react > the last time where the interconnect between the two nodes was broken > but they were still up and running? And when did you test the fail-over > capabilities of your cluster for the last time? > > What happens if one of your nodes fails and GFS thinks that it will have > to fence them until that fencing process is actually done? > > I'll get a shovel while waiting for the answer. ;-) > > Best regards > Martin > > Martin, STONITH is power-off through IPMI on HP iLO, so that should be sorted. The system is still in the testing-fase, but I'll test the loss of an iLO connection, and then a failure of a node (will IMPI fencing still claim a success of fence?). The last time (still during initial buildup test) the suviving node STONITHed the node that lost network connection (it's difficult to test a different failure scenario than a network failure. How do you make a kernel panic on purpose?). RHEL-clustering announced a succesful fence and everything continued working happily on the surviving node. The connection used for the STONITH is redundantly made to a pair of switches. Unfortunately the connection to the iLO is not redundant, but this would be a failure on a failure (so not really a concern for us). During the build-up we also noticed that if there is no successful fence (fencing was not configured), the surviving node will get a file-system time-out. Rather unfortunate. We're thinking of implementing a manual fence mechanism in addition to the automatic, so that the I/O can continue as soon as an administrator has dialed in to the cluster after having received the e-mail of the failure. But again, this would require a failure on top of a failure. I feel confident the shovel can be put back into the cupboard. Robert _______________________________________________________________________________________________ Help save paper! Do you really need to print this email? Aan de inhoud van dit bericht kunnen alleen rechten ten opzichte van Morpho B.V. worden ontleend, indien zij door rechtsgeldig ondertekende stukken worden ondersteund. De informatie in dit e-mailbericht is van vertrouwelijke aard en alleen bedoeld voor gebruik door geadresseerde. Als u een bericht onbedoeld heeft ontvangen, wordt u verzocht de verzender hiervan in kennis te stellen en het bericht te vernietigen zonder te vermenigvuldigen of andersoortig te gebruiken. The contents of this electronic mail message are only binding upon Morpho B.V. if the contents of the message are accompanied by a lawfully recognized type of signature. The contents of this electronic mail message are privileged and confidential and are intended only for use by the addressee. If you have received this electronic mail message by error, please notify the sender and delete the message without reproducing it and using it in any way. # " Ce courriel et les documents qui lui sont joints peuvent contenir des informations confidentielles ou ayant un caractère privé. S'ils ne vous sont pas destinés, nous vous signalons qu'il est strictement interdit de les divulguer, de les reproduire ou d'en utiliser de quelque manière que ce soit le contenu. Si ce message vous a été transmis par erreur, merci d'en informer l'expéditeur et de supprimer immédiatement de votre système informatique ce courriel ainsi que tous les documents qui y sont attachés." ****** " This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system." #