Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I currently have a DRBD/Heartbeat setup consisting of two servers (db01 and db02), of which, the primary nodes (db01) role will be to run an instance of MySQL. While doing some maintenance, I powered down the primary node (db01), resulting the in the secondary node (db02) taking over the role of primary (as it should). The problem arose when I powered back up db01. It came back up as a secondary node, but does not recognize that the other node indeed exists. The primary (now db02) does not seem to see db01 either. Can anyone please give me some insight on why this happened or what the solution is? I'm new to this failover solution and it seems like there should be something simple I'm missing here. The physical link between the servers is unchanged and the interfaces sharing that link can see each other. The following is the output that I currently see: db02: cat /proc/drbd version: 0.7.21 (api:79/proto:74) SVN Revision: 2326 build by buildcentos at x8664-build.centos.org, 2006-10-07 05:47:44 0: cs:StandAlone st:Primary/Unknown ld:Consistent ns:320 nr:1096624 dw:1098160 dr:29056 al:16 bm:867 lo:0 pe:0 ua:0 ap:0 1: cs:StandAlone st:Primary/Unknown ld:Consistent ns:128 nr:5192 dw:857945 dr:4180 al:0 bm:235 lo:0 pe:0 ua:0 ap:0 db01: cat /proc/drbd version: 0.7.21 (api:79/proto:74) SVN Revision: 2326 build by buildcentos at x8664-build.centos.org, 2006-10-07 05:47:44 0: cs:WFConnection st:Secondary/Unknown ld:Consistent ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 1: cs:WFConnection st:Secondary/Unknown ld:Consistent ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 The following are logs from db01 during startup: Starting DRBD resources: Jan 9 15:41:06 sdcdb01 kernel: drbd: initialised. Version: 0.7.21 (api:79/proto:74) Jan 9 15:41:06 sdcdb01 kernel: drbd: SVN Revision: 2326 build by buildcentos at x8664-build.centos.org, 2006-10-07 05:47:44 Jan 9 15:41:06 sdcdb01 kernel: drbd: registered as block device major 147 [ d0 Jan 9 15:41:12 sdcdb01 kernel: drbd0: resync bitmap: bits=71366747 words=1115106 Jan 9 15:41:12 sdcdb01 kernel: drbd0: size = 272 GB (285466986 KB) Jan 9 15:41:12 sdcdb01 kernel: klogd 1.4.1, ---------- state change ---------- d1 s0 s1 n0 n1 ]. Jan 9 15:41:14 sdcdb01 kernel: drbd0: 156 KB marked out-of-sync by on disk bit-map. Jan 9 15:41:14 sdcdb01 kernel: drbd0: Found 6 transactions (276 active extents) in activity log. Jan 9 15:41:14 sdcdb01 kernel: drbd0: drbdsetup [4774]: cstate Unconfigured --> StandAlone Jan 9 15:41:14 sdcdb01 kernel: drbd1: resync bitmap: bits=15729636 words=245776 Jan 9 15:41:14 sdcdb01 kernel: drbd1: size = 60 GB (62918541 KB) Jan 9 15:41:14 sdcdb01 kernel: drbd1: 32 KB marked out-of-sync by on disk bit-map. Jan 9 15:41:14 sdcdb01 kernel: drbd1: Found 6 transactions (273 active extents) in activity log. Jan 9 15:41:14 sdcdb01 kernel: drbd1: drbdsetup [4778]: cstate Unconfigured --> StandAlone Jan 9 15:41:14 sdcdb01 kernel: drbd0: drbdsetup [4796]: cstate StandAlone --> Unconnected Jan 9 15:41:14 sdcdb01 kernel: drbd0: drbd0_receiver [4797]: cstate Unconnected --> WFConnection Jan 9 15:41:14 sdcdb01 kernel: drbd1: drbdsetup [4804]: cstate StandAlone --> Unconnected Jan 9 15:41:14 sdcdb01 kernel: drbd1: drbd1_receiver [4805]: cstate Unconnected --> WFConnection .......... *************************************************************** DRBD's startup script waits for the peer node(s) to appear. - In case this node was already a degraded cluster before the reboot the timeout is 60 seconds. [degr-wfc-timeout] - If the peer was available before the reboot the timeout will expire after 0 seconds. [wfc-timeout] (These values are for resource 'cdbdata'; 0 sec -> wait forever) To abort waiting enter 'yes' [ 90]: To abort waiting enter 'yes' [ 94]: To abort waiting enter 'yes' [ 94]: To abort waiting enter 'yes' [ 95]: To abort waiting enter 'yes' [ 95]: To abort waiting enter 'yes' [ 95]: To abort waiting enter 'yes' [ 95]: To abort waiting enter 'yes' [ 100]:Jan 9 15:42:58 sdcdb01 ntpd[3532]: time reset -0.423527 s Jan 9 15:42:58 sdcdb01 ntpd[3532]: kernel time sync enabled 0001 [ 105]: To abort waiting enter 'yes' [ 109]: To abort waiting enter 'yes' [ 109]: Any input would be greatly appreciated. Thanks, Clint __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com