Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> -----Original Message----- > From: drbd-user-bounces at lists.linbit.com > [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of > Clinton Rosander > Sent: Tuesday, January 09, 2007 7:06 PM > To: drbd-user at lists.linbit.com > Subject: [DRBD-user] Split Brain Mode? > > Hi, > > I currently have a DRBD/Heartbeat setup consisting of > two servers (db01 and db02), of which, the primary > nodes (db01) role will be to run an instance of MySQL. > While doing some maintenance, I powered down the > primary node (db01), resulting the in the secondary > node (db02) taking over the role of primary (as it > should). > > The problem arose when I powered back up db01. It came > back up as a secondary node, but does not recognize > that the other node indeed exists. The primary (now > db02) does not seem to see db01 either. > > Can anyone please give me some insight on why this > happened or what the solution is? I'm new to this > failover solution and it seems like there should be > something simple I'm missing here. The physical link > between the servers is unchanged and the interfaces > sharing that link can see each other. The following is > the output that I currently see: > > db02: > > cat /proc/drbd > version: 0.7.21 (api:79/proto:74) > SVN Revision: 2326 build by > buildcentos at x8664-build.centos.org, 2006-10-07 > 05:47:44 > 0: cs:StandAlone st:Primary/Unknown ld:Consistent ^^^^^^^^^^ Maybe you set the resource to go standalone when the peer disconnects? If you use drbdadm wait_connect all on the primary it should switch to WFConnection state and allow the peer to connect. > ns:320 nr:1096624 dw:1098160 dr:29056 al:16 bm:867 > lo:0 pe:0 ua:0 ap:0 > 1: cs:StandAlone st:Primary/Unknown ld:Consistent > ns:128 nr:5192 dw:857945 dr:4180 al:0 bm:235 lo:0 > pe:0 ua:0 ap:0 > > db01: > > cat /proc/drbd > version: 0.7.21 (api:79/proto:74) > SVN Revision: 2326 build by > buildcentos at x8664-build.centos.org, 2006-10-07 > 05:47:44 > 0: cs:WFConnection st:Secondary/Unknown ld:Consistent > ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 > 1: cs:WFConnection st:Secondary/Unknown ld:Consistent > ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 > > > The following are logs from db01 during startup: > > Starting DRBD resources: Jan 9 15:41:06 sdcdb01 > kernel: drbd: initialised. Version: 0.7.21 > (api:79/proto:74) > Jan 9 15:41:06 sdcdb01 kernel: drbd: SVN Revision: > 2326 build by buildcentos at x8664-build.centos.org, > 2006-10-07 05:47:44 > Jan 9 15:41:06 sdcdb01 kernel: drbd: registered as > block device major 147 > [ d0 Jan 9 15:41:12 sdcdb01 kernel: drbd0: resync > bitmap: bits=71366747 words=1115106 > Jan 9 15:41:12 sdcdb01 kernel: drbd0: size = 272 GB > (285466986 KB) > Jan 9 15:41:12 sdcdb01 kernel: klogd 1.4.1, > ---------- state change ---------- > d1 s0 s1 n0 n1 ]. > Jan 9 15:41:14 sdcdb01 kernel: drbd0: 156 KB marked > out-of-sync by on disk bit-map. > Jan 9 15:41:14 sdcdb01 kernel: drbd0: Found 6 > transactions (276 active extents) in activity log. > Jan 9 15:41:14 sdcdb01 kernel: drbd0: drbdsetup > [4774]: cstate Unconfigured --> StandAlone > Jan 9 15:41:14 sdcdb01 kernel: drbd1: resync bitmap: > bits=15729636 words=245776 > Jan 9 15:41:14 sdcdb01 kernel: drbd1: size = 60 GB > (62918541 KB) > Jan 9 15:41:14 sdcdb01 kernel: drbd1: 32 KB marked > out-of-sync by on disk bit-map. > Jan 9 15:41:14 sdcdb01 kernel: drbd1: Found 6 > transactions (273 active extents) in activity log. > Jan 9 15:41:14 sdcdb01 kernel: drbd1: drbdsetup > [4778]: cstate Unconfigured --> StandAlone > Jan 9 15:41:14 sdcdb01 kernel: drbd0: drbdsetup > [4796]: cstate StandAlone --> Unconnected > Jan 9 15:41:14 sdcdb01 kernel: drbd0: drbd0_receiver > [4797]: cstate Unconnected --> WFConnection > Jan 9 15:41:14 sdcdb01 kernel: drbd1: drbdsetup > [4804]: cstate StandAlone --> Unconnected > Jan 9 15:41:14 sdcdb01 kernel: drbd1: drbd1_receiver > [4805]: cstate Unconnected --> WFConnection > .......... > *************************************************************** > DRBD's startup script waits for the peer node(s) to > appear. > - In case this node was already a degraded cluster > before the > reboot the timeout is 60 seconds. > [degr-wfc-timeout] > - If the peer was available before the reboot the > timeout will > expire after 0 seconds. [wfc-timeout] > (These values are for resource 'cdbdata'; 0 sec -> > wait forever) > To abort waiting enter 'yes' [ 90]: > To abort waiting enter 'yes' [ 94]: > To abort waiting enter 'yes' [ 94]: > To abort waiting enter 'yes' [ 95]: > To abort waiting enter 'yes' [ 95]: > To abort waiting enter 'yes' [ 95]: > To abort waiting enter 'yes' [ 95]: > To abort waiting enter 'yes' [ 100]:Jan 9 15:42:58 > sdcdb01 ntpd[3532]: time reset -0.423527 s > Jan 9 15:42:58 sdcdb01 ntpd[3532]: kernel time sync > enabled 0001 > [ 105]: > To abort waiting enter 'yes' [ 109]: > To abort waiting enter 'yes' [ 109]: > > > > Any input would be greatly appreciated. > > Thanks, > > Clint > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > ______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.