Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Mike, One issue in your CIB (though may not be the cause of this) is the order statement with promote: order ordDRBDDLM inf: msDRBD:promote cloneDLM If you explicitly define the action to take (promote) then that action is taken on all resources in that statement unless explicitly defined otherwise. So it should be: order ordDRBDDLM inf: msDRBD:promote cloneDLM:start Have you tried just rebooting the offending node? I know that's not the greatest answer but it's not serving anything right now anyway. Also how about attaching the logs when the disconnect happened from both nodes? Jake ----- Original Message ----- > From: "Mike Reid" <MBReid at thepei.com> > To: drbd-user at lists.linbit.com > Sent: Thursday, September 15, 2011 4:50:44 PM > Subject: [DRBD-user] Trouble getting node to re-join two node cluster > (OCFS2/DRBD Primary/Primary) > Trouble getting node to re-join two node cluster (OCFS2/DRBD > Primary/Primary) > Hello all, > ** I have also posted this in the OCFS2/pacemaker list, but one > response > indicated it may be more specific to DRBD? ** > We have a two-node cluster still in development that has been running > fine > for weeks (little to no traffic). I made some updates to our CIB > recently, > and everything seemed just fine. > Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and > once it > was complete one of the nodes had become completely disconnected and > I > haven't been able to reconnect since. > DRBD is working fine, everything is UpToDate and I can get both nodes > in > Primary/Primary, but when it comes down to starting OCFS2 and > mounting the > volume, I'm left with: > > resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): > > unknown error > I am using "pcmk" as the cluster_stack, and letting Pacemaker control > everything... > The last time this happened the only way I was able to resolve it was > to > reformat the device (via mkfs.ocfs2 -F). I don't think I should have > to do > this, underlying blocks seem fine, and one of the nodes is running > just > fine. The (currently) unmounted node is staying in sync as far as > DRBD is > concerned. > Here's some detail that hopefully will help, please let me know if > there's > anything else I can provide to help know the best way to get this > node back > "online": > Ubuntu 10.10 / Kernel 2.6.35 > Pacemaker 1.0.9.1 > Corosync 1.2.1 > Cluster Agents 1.0.3 (Heartbeat) > Cluster Glue 1.0.6 > OpenAIS 1.1.2 > DRBD 8.3.10 > OCFS2 1.5.0 > cat /sys/fs/ocfs2/cluster_stack = pcmk > node1: mounted.ocfs2 -d > Device FS UUID Label > /dev/sda3 ocfs2 fe4273e1-f866-4541-bbcf-66c5dfd496d6 > node2: mounted.ocfs2 -d > Device FS UUID Label > /dev/sda3 ocfs2 d6f7cc6d-21d1-46d3-9792-bc650736a5ef > /dev/drbd0 ocfs2 d6f7cc6d-21d1-46d3-9792-bc650736a5ef > * NOTES: > - Both nodes are identical, in fact one node is a direct mirror (hdd > clone) > - I have attached the CIB (crm configure edit contents) and mount > trace > ------ End of Forwarded Message > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110922/09edda5f/attachment.htm>