Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> Message: 1 > Date: Thu, 5 Jan 2012 23:01:24 +0100 > From: Florian Haas <florian at hastexo.com> > Subject: Re: [DRBD-user] DRBD+Pacemaker: Won't promote with only one > node > To: drbd-user at lists.linbit.com > Message-ID: > <CAPUexz9jgpq0V49eZmwDH3cThrGEJ7VopXCh1_27yjX-6JC+fQ at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Thu, Jan 5, 2012 at 6:36 PM, William Seligman > <seligman at nevis.columbia.edu> wrote: >> Sure. I didn't do this before, since the configuration is complex. I also don't >> know which would be more comprehensible, so I've attached both cib.xml and the >> result of "crm configure show". I should mention that I'm a lazy bum, so I use >> crm-gui to configure corosync; that's why these files look more baroque than usual. > In the CIB you posted, both nodes are in the "UNCLEAN (offline)" > state. In that state, nothing gets promoted, nothing gets started. Are > you sure you posted the right CIB dump? That caused me to panic! I rushed to check that my cluster was running. It's OK; an occasional panic is good once in a while. I assume you're talking about the following lines: <nodes> <node id="orestes.nevis.columbia.edu" uname="orestes.nevis.columbia.edu" type="normal"> <instance_attributes id="nodes-orestes.nevis.columbia.edu"> <nvpair id="nodes-orestes.nevis.columbia.edu-standby" name="standby" value="off"/> </instance_attributes> </node> <node id="hypatia.nevis.columbia.edu" uname="hypatia.nevis.columbia.edu" type="normal"> <instance_attributes id="nodes-hypatia.nevis.columbia.edu"> <nvpair id="nodes-hypatia.nevis.columbia.edu-standby" name="standby" value="off"/> </instance_attributes> </node> </nodes> For both nodes, the attribute "standby" is set to "off", which evidently means it's not in standby mode, so it's online. According to a web page I found at home last night, but can't find at work today, "standby:off" gets added to a node's attributes if it was ever marked "UNCLEAN (offline)" and brought online again; this has indeed happened with both my nodes. To confirm, the first few lines from running "crm status" are: ============ Last updated: Fri Jan 6 11:29:23 2012 Stack: openais Current DC: hypatia.nevis.columbia.edu - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 25 Resources configured. ============ Online: [ orestes.nevis.columbia.edu hypatia.nevis.columbia.edu ] Master/Slave Set: Admin Masters: [ hypatia.nevis.columbia.edu ] Slaves: [ orestes.nevis.columbia.edu ] Getting back to my question, I hunted through my logs and saw that during time of "one-node; no promotion" crm reported the above status as: Master/Slave Set: Admin Slaves: [ hypatia.nevis.columbia.edu ] Stopped: [ AdminDrbd:1 ] ... and it simply stayed like that. Since all the other cluster resources depend on Admin:promote, nothing else would happen. The relevant drbd messages in the log file appear to be: Dec 30 19:24:26 hypatia kernel: events: mcg drbd: 1 Dec 30 19:24:26 hypatia kernel: drbd: initialized. Version: 8.4.1 (api:1/proto:86-100) Dec 30 19:24:26 hypatia kernel: drbd: GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root at hypatia.nevis.columbia.edu, 2011-12-21 13:42:51 Dec 30 19:24:26 hypatia kernel: drbd: registered as block device major 147 Dec 30 19:24:26 hypatia lrmd: [5800]: info: RA output: (AdminDrbd:0:start:stdout) Dec 30 19:24:26 hypatia kernel: igb: eth1 NIC Link is Down Dec 30 19:24:27 hypatia kernel: d-con admin: Starting worker thread (from drbdsetup [7213]) Dec 30 19:24:27 hypatia kernel: block drbd1: disk( Diskless -> Attaching ) Dec 30 19:24:27 hypatia kernel: d-con admin: Method to ensure write ordering: barrier Dec 30 19:24:27 hypatia kernel: block drbd1: max BIO size = 130560 Dec 30 19:24:27 hypatia kernel: block drbd1: drbd_bm_resize called with capacity == 1953460176 Dec 30 19:24:27 hypatia kernel: block drbd1: resync bitmap: bits=244182522 words=3815352 pages=7452 Dec 30 19:24:27 hypatia kernel: block drbd1: size = 931 GB (976730088 KB) Dec 30 19:24:27 hypatia lrmd: [5800]: info: RA output: (AdminDrbd:0:start:stderr) Marked additional 1028 MB as out-of-sync based on AL. Dec 30 19:24:27 hypatia crmd: [5803]: info: process_lrm_event: LRM operation WorkDrbd:0_start_0 (call=49, rc=0, cib-update=56, confirmed=true) ok Dec 30 19:24:28 hypatia kernel: block drbd1: bitmap READ of 7452 pages took 131 jiffies Dec 30 19:24:28 hypatia kernel: block drbd1: recounting of set bits took additional 21 jiffies Dec 30 19:24:28 hypatia kernel: block drbd1: 1028 MB (263168 bits) marked out-of-sync by on disk bit-map. Dec 30 19:24:28 hypatia kernel: block drbd1: disk( Attaching -> Consistent ) Dec 30 19:24:28 hypatia kernel: block drbd1: attached to UUIDs A82875A514F576EB:0000000000000000:5283A3879DE4DED1:5282A3879DE4DED1 Dec 30 19:24:28 hypatia lrmd: [5800]: info: RA output: (AdminDrbd:0:start:stdout) Dec 30 19:24:28 hypatia kernel: d-con admin: conn( StandAlone -> Unconnected ) Dec 30 19:24:28 hypatia kernel: d-con admin: Starting receiver thread (from drbd_w_admin [7214]) Dec 30 19:24:28 hypatia kernel: d-con admin: receiver (re)started Dec 30 19:24:28 hypatia kernel: d-con admin: conn( Unconnected -> WFConnection ) Dec 30 19:24:28 hypatia lrmd: [5800]: info: RA output: (AdminDrbd:0:start:stdout) I noticed today a post on Linux-HA that appears to be the same problem as mine: <http://www.gossamer-threads.com/lists/drbd/users/19943> Unfortunately no resolution to the problem was posted then. Any thoughts? -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4497 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120106/fe0e2328/attachment.bin>