<html>
<body>
Hi<br><br>
Our team trying to setup a mailserver cluster with heartbeat and DRBD to
mount /var/spool/mail, but we are having a problem.<br>
<br>
If we choose any of the cluster members and simply pull the power plug
out and later turn it on again, everything works fine. But if the primary
is rebooted or halted using the normal UNIX commands, the secondary
becomes primary but when the original primary server is up again, the
current primary turns to cs:StandAlone forever logging the following
messages:<br><br>
<br>
drbd0: Current Primary shall become sync TARGET! Aborting to prevent data
corruption <br>
drbd0: error receiving ReportParams, l: 72!<br><br>
It does not matter if auto_failback is on or off.<br>
Is this supposed to happen? Why?<br><br>
<br>
We are using:<br>
Fedora Core 2, kernel 2.6.8-1.521<br>
DRBD 0.7.3 and 0.7.4 (same behaviour)<br>
Heartbeat 1.2.2<br><br>
The logs, with auto_failback off:<br><br>
The original secondary (SLAVE):<br>
===============================<br>
# cat /proc/drbd (while the other is down)<br>
version: 0.7.4 (api:76/proto:74)<br>
SVN Revision: 1537M build by root@MASTER.ele.puc-rio.br, 2004-09-13
16:14:51<br>
0: cs:WFConnection st:Primary/Unknown ld:Consistent<br>
ns:0 nr:811008 dw:811012 dr:17 al:0 bm:55 lo:0 pe:0
ua:0 ap:0<br><br>
# cat /proc/drbd (after both nodes up again)<br>
version: 0.7.4 (api:76/proto:74)<br>
SVN Revision: 1537M build by root@MASTER.ele.puc-rio.br, 2004-09-13
16:14:51<br>
0: cs:StandAlone st:Primary/Unknown ld:Consistent<br>
ns:0 nr:811008 dw:811012 dr:17 al:0 bm:55 lo:0 pe:0
ua:0 ap:0<br><br>
# /var/log/messages:<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: drbd0_receiver [12147]: cstate
WFConnection --> WFReportParams<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: Handshake successful: DRBD Network
Protocol version 74<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: Connection established.<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: I am(P):
1:00000002:00000007:00000070:00000013:10<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: Peer(S):
1:00000002:00000007:00000072:00000012:10<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: Current Primary shall become sync
TARGET! Aborting to prevent data corruption.<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: drbd0_receiver [12147]: cstate
WFReportParams --> StandAlone<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: error receiving ReportParams, l:
72!<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: asender terminated<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: worker terminated<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: drbd0_receiver [12147]: cstate
StandAlone --> StandAlone<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: Connection lost.<br>
Sep 13 18:12:27 SLAVE kernel: drbd0: receiver terminated<br>
Sep 13 18:12:30 SLAVE heartbeat[12082]: info: Heartbeat restart on node
MASTER.ele.puc-rio.br<br>
Sep 13 18:12:30 SLAVE heartbeat[12082]: info: Link
MASTER.ele.puc-rio.br:eth1 up.<br>
Sep 13 18:12:30 SLAVE heartbeat[12082]: info: Status update for node
MASTER.ele.puc-rio.br: status up<br>
Sep 13 18:12:30 SLAVE heartbeat[12082]: info: Status update for node
MASTER.ele.puc-rio.br: status active<br>
Sep 13 18:12:30 SLAVE heartbeat[12082]: info: remote resource transition
completed.<br>
Sep 13 18:12:30 SLAVE heartbeat: info: Running
/usr/local/etc/ha.d/rc.d/status status<br><br>
<br>
The original primary (MASTER):<br>
==============================<br>
# cat /proc/drbd<br>
version: 0.7.4 (api:76/proto:74)<br>
SVN Revision: 1537M build by phil@nudl, 2004-09-09 19:53:07<br>
0: cs:WFConnection st:Secondary/Unknown ld:Consistent<br>
ns:0 nr:0 dw:0 dr:0 al:0 bm:198 lo:0 pe:0 ua:0
ap:0<br><br>
# /var/log/messages:<br><br>
Sep 13 18:12:27 MASTER kernel: drbd: initialised. Version: 0.7.4
(api:76/proto:74)<br>
Sep 13 18:12:27 MASTER kernel: drbd: SVN Revision: 1537M build by
phil@nudl, 2004-09-09 19:53:07<br>
Sep 13 18:12:27 MASTER kernel: drbd: registered as block device major
147<br>
Sep 13 18:12:27 MASTER kernel: drbd0: resync bitmap: bits=3113963
words=97312<br>
Sep 13 18:12:27 MASTER kernel: drbd0: size = 11 GB (12455852 KB)<br>
Sep 13 18:12:27 MASTER kernel: drbd0: 0 KB marked out-of-sync by on disk
bit-map.<br>
Sep 13 18:12:27 MASTER kernel: drbd0: Found 6 transactions (324 active
extents) in activity log.<br>
Sep 13 18:12:27 MASTER kernel: drbd0: Marked additional 99 MB as
out-of-sync based on AL.<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbdsetup [1964]: cstate
Unconfigured --> StandAlone<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbdsetup [1966]: cstate StandAlone
--> Unconnected<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
Unconnected --> WFConnection<br>
Sep 13 18:12:27 MASTER drbd: [r0].<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
WFConnection --> WFReportParams<br>
Sep 13 18:12:27 MASTER kernel: drbd0: Handshake successful: DRBD Network
Protocol version 74<br>
Sep 13 18:12:27 MASTER kernel: drbd0: Connection established.<br>
Sep 13 18:12:27 MASTER kernel: drbd0: I am(S):
1:00000002:00000007:00000072:00000012:10<br>
Sep 13 18:12:27 MASTER kernel: drbd0: Peer(P):
1:00000002:00000007:00000070:00000013:10<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
WFReportParams --> WFBitMapS<br>
Sep 13 18:12:27 MASTER rc: Starting drbd: succeeded<br>
Sep 13 18:12:27 MASTER kernel: drbd0: sock_sendmsg returned -104<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
WFBitMapS --> BrokenPipe<br>
Sep 13 18:12:27 MASTER kernel: drbd0: short sent ReportBitMap size=4096
sent=3344<br>
Sep 13 18:12:27 MASTER kernel: drbd0: Secondary/Unknown -->
Secondary/Primary<br>
Sep 13 18:12:27 MASTER kernel: drbd0: meta connection shut down by
peer.<br>
Sep 13 18:12:27 MASTER kernel: drbd0: asender terminated<br>
Sep 13 18:12:27 MASTER kernel: drbd0: sock was shut down by peer<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
BrokenPipe --> BrokenPipe<br>
Sep 13 18:12:27 MASTER kernel: drbd0: short read expecting header on
sock: r=0<br>
Sep 13 18:12:27 MASTER kernel: drbd0: worker terminated<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
BrokenPipe --> Unconnected<br>
Sep 13 18:12:27 MASTER kernel: drbd0: Connection lost.<br>
Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
Unconnected --> WFConnection<br><br>
<br><br>
<br>
<br>
Thanks for any help.<br>
Sorry if this is a repost.<br><br>
Luís<br>
<br>
<br>
<font face="fixedsys">!
,,, <br>
!
(@ @) <br>
+--------------------oOO-/-(*)-\-OOo---+----------------------------------+<br>
| Luis Fernando V.
Gomes
| Email: lf@ele.puc-rio.br <br>
| Network
Administrator
|<br>
| Dept. Engenharia
Eletrica
|<br>
|
PUC-Rio
| Voz: +(55) (21) 3114-1220 <br>
| R. Marques de Sao Vicente 225/401L | Fax: +(55)
(21) 3114-1232<br>
| 22453-900 - Rio de Janeiro/RJ, Brasil|<br>
+--------------------------------------+----------------------------------+</font>
<br>
</body>
</html>