Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Gordan and Everyone, Thank you for your tips. Now that you mention it, this is the error I get when the process fails Apr 14 03:44:16 tweety1 kernel: drbd1: receiver terminated Apr 14 03:44:16 tweety1 kernel: drbd1: receiver (re)started Apr 14 03:44:16 tweety1 kernel: drbd1: conn( Unconnected -> WFConnection ) Apr 14 03:44:16 tweety1 kernel: drbd1: Handshake successful: Agreed network protocol version 88 Apr 14 03:44:16 tweety1 kernel: drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC Apr 14 03:44:16 tweety1 kernel: drbd1: conn( WFConnection -> WFReportParams ) Apr 14 03:44:16 tweety1 kernel: drbd1: Starting asender thread (from drbd1_receiver [2631]) Apr 14 03:44:16 tweety1 kernel: drbd1: data-integrity-alg: <not-used> Apr 14 03:44:16 tweety1 kernel: drbd1: Split-Brain detected, 2 primaries, automatically solved. Sync from peer node Apr 14 03:44:16 tweety1 kernel: drbd1: helper command: /sbin/drbdadm pri-lost Apr 14 03:44:16 tweety1 kernel: drbd1: I shall become SyncTarget, but I am primary! Any ideas on how to go through it? Extract from my config handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/drbd/outdate-peer.sh on tweety1 192.168.1.251 10.254.254.253 on tweety2 192.168.1.252 10.254.254.254"; outdate-peer "/sbin/obliterate"; pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root"; split-brain "echo split-brain. drbdadm -- --discard-my-data connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root"; } Thank you. -----Original Message----- From: drbd at bobich.net [mailto:drbd at bobich.net] Sent: Monday, April 14, 2008 3:23 PM To: drbd-user at lists.linbit.com Subject: Re: [DRBD-user] Question for Split Brain On Mon, 14 Apr 2008, Theophanis Kontogiannis wrote: > If I have two nodes and all the resources are run Primary/Primary, when Split Brain is > dedected, and based on the algorithms, one of them will become the sync target. > > Let us assume that for some reason (it happens to me once per 2-4 days), the worker fails, so > the systems operate in split brain condition for a long time. Sounds like you need to fix your networking problem. Reliable communication between the nodes is a pretty fundamental requirement. > In that case some files have been written on A side and some on B side. > > Also let us assume that node A is the SyncSource and node B is the SyncTarget. > > This means that all files changed on node A during the SB, will be updated on node B but the > files changed in node B will not be updated on node A? Yes. Node A and node B will both end up with the volume image from node A. All changes on node B will be lost. If this is a problem (and I can't imagine it not being a problem), you should implement fencing that will forcefully shut down one of the nodes and fail over the resources (only the IP addresses if you are running two primaries) to the remaining primary. Gordan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080415/9653150c/attachment.htm>