<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Hello all,<BR>
<BR>
** I have also posted this in the OCFS2/pacemaker list, but one response<BR>
indicated it may be more specific to DRBD? **<BR>
<BR>
We have a two-node cluster still in development that has been running fine<BR>
for weeks (little to no traffic). I made some updates to our CIB recently,<BR>
and everything seemed just fine.<BR>
<BR>
Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it<BR>
was complete one of the nodes had become completely disconnected and I<BR>
haven't been able to reconnect since.<BR>
<BR>
DRBD is working fine, everything is UpToDate and I can get both nodes in<BR>
Primary/Primary, but when it comes down to starting OCFS2 and mounting the<BR>
volume, I'm left with:<BR>
<BR>
> resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error<BR>
<BR>
I am using "pcmk" as the cluster_stack, and letting Pacemaker control<BR>
everything...<BR>
<BR>
The last time this happened the only way I was able to resolve it was to<BR>
reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do<BR>
this, underlying blocks seem fine, and one of the nodes is running just<BR>
fine. The (currently) unmounted node is staying in sync as far as DRBD is<BR>
concerned.<BR>
<BR>
Here's some detail that hopefully will help, please let me know if there's<BR>
anything else I can provide to help know the best way to get this node back<BR>
"online":<BR>
<BR>
<BR>
Ubuntu 10.10 / Kernel 2.6.35<BR>
<BR>
Pacemaker 1.0.9.1<BR>
Corosync 1.2.1<BR>
Cluster Agents 1.0.3 (Heartbeat)<BR>
Cluster Glue 1.0.6<BR>
OpenAIS 1.1.2<BR>
<BR>
DRBD 8.3.10<BR>
OCFS2 1.5.0<BR>
<BR>
cat /sys/fs/ocfs2/cluster_stack = pcmk<BR>
<BR>
node1: mounted.ocfs2 -d<BR>
<BR>
Device FS UUID Label<BR>
/dev/sda3 ocfs2 fe4273e1-f866-4541-bbcf-66c5dfd496d6<BR>
<BR>
node2: mounted.ocfs2 -d<BR>
<BR>
Device FS UUID Label<BR>
/dev/sda3 ocfs2 d6f7cc6d-21d1-46d3-9792-bc650736a5ef<BR>
/dev/drbd0 ocfs2 d6f7cc6d-21d1-46d3-9792-bc650736a5ef<BR>
<BR>
* NOTES:<BR>
- Both nodes are identical, in fact one node is a direct mirror (hdd clone)<BR>
- I have attached the CIB (crm configure edit contents) and mount trace<BR>
<BR>
------ End of Forwarded Message<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>