<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Arial; font-size: 10pt; color: #000000'>Mike,<br><br>One issue in your CIB (though may not be the cause of this) is the order statement with promote:<br><span style="font-family: terminal,monaco;"></span><pre>order ordDRBDDLM inf: msDRBD:promote cloneDLM<br></pre>If you explicitly define the action to take (promote) then that action is taken on all resources in that statement unless explicitly defined otherwise. So it should be:<br><pre>order ordDRBDDLM inf: msDRBD:promote cloneDLM:start<br><br></pre>Have you tried just rebooting the offending node? I know that's not the greatest answer but it's not serving anything right now anyway.<br><br>Also how about attaching the logs when the disconnect happened from both nodes?<br><br>Jake<br><br><hr id="zwchr"><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Mike Reid" <MBReid@thepei.com><br><b>To: </b>drbd-user@lists.linbit.com<br><b>Sent: </b>Thursday, September 15, 2011 4:50:44 PM<br><b>Subject: </b>[DRBD-user] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)<br><br>
<title>Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)</title>
<!-- Converted from text/plain format -->
<p><font size="2">Hello all,<br>
<br>
** I have also posted this in the OCFS2/pacemaker list, but one response<br>
indicated it may be more specific to DRBD? **<br>
<br>
We have a two-node cluster still in development that has been running fine<br>
for weeks (little to no traffic). I made some updates to our CIB recently,<br>
and everything seemed just fine.<br>
<br>
Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it<br>
was complete one of the nodes had become completely disconnected and I<br>
haven't been able to reconnect since.<br>
<br>
DRBD is working fine, everything is UpToDate and I can get both nodes in<br>
Primary/Primary, but when it comes down to starting OCFS2 and mounting the<br>
volume, I'm left with:<br>
<br>
> resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error<br>
<br>
I am using "pcmk" as the cluster_stack, and letting Pacemaker control<br>
everything...<br>
<br>
The last time this happened the only way I was able to resolve it was to<br>
reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do<br>
this, underlying blocks seem fine, and one of the nodes is running just<br>
fine. The (currently) unmounted node is staying in sync as far as DRBD is<br>
concerned.<br>
<br>
Here's some detail that hopefully will help, please let me know if there's<br>
anything else I can provide to help know the best way to get this node back<br>
"online":<br>
<br>
<br>
Ubuntu 10.10 / Kernel 2.6.35<br>
<br>
Pacemaker 1.0.9.1<br>
Corosync 1.2.1<br>
Cluster Agents 1.0.3 (Heartbeat)<br>
Cluster Glue 1.0.6<br>
OpenAIS 1.1.2<br>
<br>
DRBD 8.3.10<br>
OCFS2 1.5.0<br>
<br>
cat /sys/fs/ocfs2/cluster_stack = pcmk<br>
<br>
node1: mounted.ocfs2 -d<br>
<br>
Device FS UUID Label<br>
/dev/sda3 ocfs2 fe4273e1-f866-4541-bbcf-66c5dfd496d6<br>
<br>
node2: mounted.ocfs2 -d<br>
<br>
Device FS UUID Label<br>
/dev/sda3 ocfs2 d6f7cc6d-21d1-46d3-9792-bc650736a5ef<br>
/dev/drbd0 ocfs2 d6f7cc6d-21d1-46d3-9792-bc650736a5ef<br>
<br>
* NOTES:<br>
- Both nodes are identical, in fact one node is a direct mirror (hdd clone)<br>
- I have attached the CIB (crm configure edit contents) and mount trace<br>
<br>
------ End of Forwarded Message<br>
<br>
</font>
</p>
<br>_______________________________________________<br>drbd-user mailing list<br>drbd-user@lists.linbit.com<br>http://lists.linbit.com/mailman/listinfo/drbd-user<br></blockquote><br></div></body></html>