<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Arial; font-size: 10pt; color: #000000'>Mike,<br><br>One issue in your CIB (though may not be the cause of this) is the order statement with promote:<br><span style="font-family: terminal,monaco;"></span><pre>order ordDRBDDLM inf: msDRBD:promote cloneDLM<br></pre>If you explicitly define the action to take (promote) then that action is taken on all resources in that statement unless explicitly defined otherwise.&nbsp; So it should be:<br><pre>order ordDRBDDLM inf: msDRBD:promote cloneDLM:start<br><br></pre>Have you tried just rebooting the offending node?&nbsp; I know that's not the greatest answer but it's not serving anything right now anyway.<br><br>Also how about attaching the logs when the disconnect happened from both nodes?<br><br>Jake<br><br><hr id="zwchr"><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Mike Reid" &lt;MBReid@thepei.com&gt;<br><b>To: </b>drbd-user@lists.linbit.com<br><b>Sent: </b>Thursday, September 15, 2011 4:50:44 PM<br><b>Subject: </b>[DRBD-user] Trouble getting node to re-join two node cluster&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(OCFS2/DRBD Primary/Primary)<br><br>


<title>Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)</title>


<!-- Converted from text/plain format -->


<p><font size="2">Hello all,<br>

<br>

** I have also posted this in the OCFS2/pacemaker list, but one response<br>

indicated it may be more specific to DRBD? **<br>

<br>

We have a two-node cluster still in development that has been running fine<br>

for weeks (little to no traffic). I made some updates to our CIB recently,<br>

and everything seemed just fine.<br>

<br>

Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it<br>

was complete one of the nodes had become completely disconnected and I<br>

haven't been able to reconnect since.<br>

<br>

DRBD is working fine, everything is UpToDate and I can get both nodes in<br>

Primary/Primary, but when it comes down to starting OCFS2 and mounting the<br>

volume, I'm left with:<br>

<br>

&gt; resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error<br>

<br>

I am using "pcmk" as the cluster_stack, and letting Pacemaker control<br>

everything...<br>

<br>

The last time this happened the only way I was able to resolve it was to<br>

reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do<br>

this, underlying blocks seem fine, and one of the nodes is running just<br>

fine. The (currently) unmounted node is staying in sync as far as DRBD is<br>

concerned.<br>

<br>

Here's some detail that hopefully will help, please let me know if there's<br>

anything else I can provide to help know the best way to get this node back<br>

"online":<br>

<br>

<br>

Ubuntu 10.10 / Kernel 2.6.35<br>

<br>

Pacemaker 1.0.9.1<br>

Corosync 1.2.1<br>

Cluster Agents 1.0.3 (Heartbeat)<br>

Cluster Glue 1.0.6<br>

OpenAIS 1.1.2<br>

<br>

DRBD 8.3.10<br>

OCFS2 1.5.0<br>

<br>

cat /sys/fs/ocfs2/cluster_stack = pcmk<br>

<br>

node1: mounted.ocfs2 -d<br>

<br>

Device&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS&nbsp;&nbsp;&nbsp;&nbsp; UUID&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Label<br>

/dev/sda3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ocfs2&nbsp; fe4273e1-f866-4541-bbcf-66c5dfd496d6<br>

<br>

node2: mounted.ocfs2 -d<br>

<br>

Device&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS&nbsp;&nbsp;&nbsp;&nbsp; UUID&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Label<br>

/dev/sda3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ocfs2&nbsp; d6f7cc6d-21d1-46d3-9792-bc650736a5ef<br>

/dev/drbd0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ocfs2&nbsp; d6f7cc6d-21d1-46d3-9792-bc650736a5ef<br>

<br>

* NOTES:<br>

- Both nodes are identical, in fact one node is a direct mirror (hdd clone)<br>

- I have attached the CIB (crm configure edit contents) and mount trace<br>

<br>

------ End of Forwarded Message<br>

<br>

</font>

</p>


<br>_______________________________________________<br>drbd-user mailing list<br>drbd-user@lists.linbit.com<br>http://lists.linbit.com/mailman/listinfo/drbd-user<br></blockquote><br></div></body></html>