<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-7">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
..MsoChpDefault
        {mso-style-type:export-only;}
@page Section1
        {size:612.0pt 792.0pt;
        margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EL link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span lang=EN-US>Dear All,<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>I have two cluster nodes, for which I use DRBD
8.3 (compiled and installed by me as rpm) as the shared block device.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>The two identical systems comprise of kernel
2.6.18-92.1.22.el5.centos.plus, drbd-8.3.0-3, drbd-km-2.6.18_92.1.22.el5.centos.plus-8.3.0-3<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>On top of DRBD resources, I have LVM
(clustered) and on top of that GFS2.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>The issue is the following.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>I run both nodes as Primary/Primary. On
both nodes, various applications run, that concurrently write on the filesystem
(however not on the same files or even directories). I get randomly but
constantly (i.e. it always happens at least once per day) the following errors:<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>On node 1:<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>……………….<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: susp( 1 -> 0 )<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 9<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 8<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 7<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 6<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 5<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 4<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 3<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 2<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: [drbd1_worker/4658] sock_sendmsg
time expired, ko = 1<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: peer( Primary -> Unknown ) conn(
WFBitMapS -> Timeout ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: short sent ReportBitMap size=4096
sent=276<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: short read expecting header on sock:
r=-512<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: asender terminated<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Terminating asender thread<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Connection closed<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: helper command: /sbin/drbdadm
fence-peer minor-1<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: helper command: /sbin/drbdadm
fence-peer minor-1 exit code 2 (0x200)<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: fence-peer helper broken, returned 2<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Considering state change from bad
state. Error would be: 'Refusing to be Primary while peer is not outdated'<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: old = { cs:Timeout
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: new = { cs:Unconnected
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: conn( Timeout -> Unconnected )<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: receiver terminated<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Restarting receiver thread<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: receiver (re)started<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Considering state change from bad
state. Error would be: 'Refusing to be Primary while peer is not outdated'<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: old = { cs:Unconnected
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: new = { cs:WFConnection ro:Primary/Unknown
ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: conn( Unconnected -> WFConnection
)<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>……………………………………<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>[root@tweety-1 ~]# drbdadm status <o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><drbd-status version="8.3.0"
api="88"><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><resources
config_file="/etc/drbd.conf"><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><resource minor="0"
name="r0" cs="Connected" ro1="Primary"
ro2="Primary" ds1="UpToDate" ds2="UpToDate" /><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><resource minor="1"
name="r1" cs="WFConnection" ro1="Primary"
ro2="Unknown" ds1="UpToDate" ds2="DUnknown"
suspended /><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US></resources><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US></drbd-status><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>On node 2:<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>…………………..<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: sock was reset by peer<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: peer( Primary -> Unknown ) conn(
Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: short read expecting header on sock:
r=-104<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: meta connection shut down by peer.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: asender terminated<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Terminating asender thread<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Creating new current UUID<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Connection closed<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: helper command: /sbin/drbdadm
fence-peer minor-1<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: helper command: /sbin/drbdadm
fence-peer minor-1 exit code 2 (0x200)<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: fence-peer helper broken, returned 2<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Considering state change from bad
state. Error would be: 'Refusing to be Primary while peer is not outdated'<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: old = { cs:BrokenPipe
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: new = { cs:Unconnected
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: conn( BrokenPipe -> Unconnected )<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: receiver terminated<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Restarting receiver thread<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: receiver (re)started<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Considering state change from bad
state. Error would be: 'Refusing to be Primary while peer is not outdated'<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: old = { cs:Unconnected
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: new = { cs:WFConnection
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: conn( Unconnected -> WFConnection
)<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Handshake successful: Agreed network
protocol version 89<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Peer authenticated using 20 bytes of
'sha1' HMAC<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Considering state change from bad
state. Error would be: 'Refusing to be Primary while peer is not outdated'<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: old = { cs:WFConnection
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: new = { cs:WFReportParams
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: conn( WFConnection ->
WFReportParams )<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Starting asender thread (from
drbd1_receiver [4858])<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: data-integrity-alg: crc32c<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: meta connection shut down by peer.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: conn( WFReportParams ->
NetworkFailure )<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: asender terminated<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>drbd1: Terminating asender thread<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>………………………<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>[root@tweety-2 ~]# drbdadm status<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><drbd-status version="8.3.0"
api="88"><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><resources
config_file="/etc/drbd.conf"><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><resource minor="0"
name="r0" cs="Connected" ro1="Primary"
ro2="Primary" ds1="UpToDate" ds2="UpToDate" /><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><resource minor="1"
name="r1" cs="NetworkFailure" ro1="Primary"
ro2="Unknown" ds1="UpToDate" ds2="DUnknown"
suspended /><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US></resources><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US></drbd-status><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>My drbd.conf is the following:<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>global {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> # minor-count 64;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> # dialog-refresh 5; # 5 seconds<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> # disable-ip-verification;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> usage-count yes;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>}<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>common {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> protocol C;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US> syncer {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> rate 100M;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> al-extents 257;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US> handlers {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> pri-on-incon-degr "echo b >
/proc/sysrq-trigger ; reboot -f";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> pri-lost-after-sb "echo b >
/proc/sysrq-trigger ; reboot -f";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> local-io-error "echo o >
/proc/sysrq-trigger ; halt -f";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> outdate-peer
"/sbin/obliterate";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> pri-lost "echo pri-lost. Have a
look at the log files. | mail -s 'DRBD Alert' root; echo b >
/proc/sysrq-trigger ; reboot -f";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> split-brain "echo split-brain.
drbdadm -- --discard-my-data connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert'
root";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US> startup {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> wfc-timeout 100;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> degr-wfc-timeout 60; # 1 minutes.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> #wait-after-sb;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> become-primary-on both;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US> disk {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> #on-io-error pass-on;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> fencing resource-and-stonith;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US> net {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> sndbuf-size 512k;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> timeout 60; # (unit = 0.1
seconds)<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> connect-int 10; # (unit = 1
second)<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> ping-int 10; # (unit = 1
second)<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> ping-timeout 50; # (unit = 0.1
seconds)<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> max-buffers 2048;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> max-epoch-size 2048;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> ko-count 10;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> allow-two-primaries;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> cram-hmac-alg "sha1";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> shared-secret "tweety";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> after-sb-0pri discard-least-changes;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> after-sb-1pri violently-as0p;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> after-sb-2pri violently-as0p;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> rr-conflict call-pri-lost;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> data-integrity-alg "crc32c";<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US> }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>}<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>resource r0 {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> device /dev/drbd0;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> disk /dev/hda4;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> meta-disk internal;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> on tweety-1 { address
10.254.254.253:7788; }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> on tweety-2 { address
10.254.254.254:7788; }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>}<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>resource r1 {<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> device /dev/drbd1;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> disk /dev/hdb4;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> meta-disk internal;<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> on tweety-1 { address
10.254.254.253:7789; }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US> on tweety-2 { address
10.254.254.254:7789; }<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>}<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>I have no idea what this is, and googling
did not help.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>Obviously this error turns the cluster
useless. <o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US>The processes get Demonized, and since no fencing
is performed (is that related to the above errors???!!!), manual intervention
is needed.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>Could someone be kind enough to share his
knowledge with me, on what the problem is, what might cause it and how to solve
it?<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>Thank you All for your time.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US>Theophanis Kontogiannis<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
</div>
</body>
</html>