Hi List,<br>
<br>
I used DRBD in dual primary mode with ocfs2 for my load balancing
web server cluster. I didn't encounter any errors during setup and
when I put the web site on the DRBD device on the primary node, it
replicated without any errors. It has been running fine during the
week of testing but this morning when we updated code located on the
DRBD device we noticed it was not replicating to the secondary node.
<br>
the DRBD device was mounted on both nodes but /proc/drbd output
this:<br>
<br>
<b>version: 8.3.7 (api:88/proto:86-91)<br>
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
<a class="moz-txt-link-abbreviated" href="mailto:root@web01.junkmail.co.za">root@web01.junkmail.co.za</a>, 2012-01-10 09:54:40<br>
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----<br>
ns:0 nr:0 dw:5960937 dr:5047235 al:1490 bm:1363 lo:0 pe:0 ua:0
ap:0 ep:1 wo:b oos:8840028</b><br>
<br>
<br>
I restarted drbd and ocfs2 but still the result was the same. Next I
rebooted the misbehaving node and noticed when it came back up that
the DRBD device was no longer mounted. <br>
<br>
Trying to mount the device manually returns this error:<br>
<b>mount /dev/drbd0<br>
mount.ocfs2: I/O error on channel while opening device /dev/drbd0</b><br>
<br>
<br>
A tail of the log file shows nothing but an earlier entry shows
this:<br>
<br>
<b>Feb 17 10:47:54 web02 kernel: [ 13.531600] block drbd0: disk(
Attaching -> UpToDate ) <br>
Feb 17 10:47:54 web02 kernel: [ 13.535865] block drbd0: conn(
StandAlone -> Unconnected ) <br>
Feb 17 10:47:54 web02 kernel: [ 13.535889] block drbd0: Starting
receiver thread (from drbd0_worker [1484])<br>
Feb 17 10:47:54 web02 kernel: [ 13.535998] block drbd0: receiver
(re)started<br>
Feb 17 10:47:54 web02 kernel: [ 13.536006] block drbd0: conn(
Unconnected -> WFConnection )<br>
<br>
<br>
</b>This is my r1.res file:<br>
<br>
<b>===============================================================<br>
resource r1 { <br>
meta-disk internal; <br>
device /dev/drbd0; <br>
disk /dev/vol01/docroot; <br>
<br>
syncer { rate 1000M; } <br>
net { <br>
allow-two-primaries; <br>
after-sb-0pri discard-zero-changes; <br>
after-sb-1pri discard-secondary; <br>
after-sb-2pri disconnect; <br>
} <br>
startup { become-primary-on both; } <br>
<br>
on <a href="http://web01.junkmail.co.za">web01.junkmail.co.za</a> { address <a href="http://10.0.0.111:7789">10.0.0.111:7789</a>; } <br>
on <a href="http://web02.junkmail.co.za">web02.junkmail.co.za</a> { address <a href="http://10.0.0.112:7789">10.0.0.112:7789</a>; } <br>
}</b><br>
<b>===============================================================</b><br>
<br>
<br>
<br>
Here is /etc/ocfs2/cluster.conf:<br>
<br>
===============================================================<br>
<b>cluster:<br>
node_count = 2<br>
name = jbm_web<br>
<br>
node:<br>
ip_port = 7777<br>
ip_address = 10.0.0.111<br>
number = 1<br>
name = web01<br>
cluster = jbm_web<br>
<br>
node:<br>
ip_port = 7777<br>
ip_address = 10.0.0.112<br>
number = 2<br>
name = web02<br>
cluster = jbm_web<br>
================================================================<br>
<br>
<br>
<br>
</b>Any help/ideas much appreciated - the pressure is on here.<br>
<br>
Thanks<br><br>Lawrence<br>
<b><br>
</b>