[DRBD-user] Current Primary shall become sync TARGET!

Andreas Schultz aschultz at tpip.net
Fri Sep 17 10:00:38 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Luis,

First, please send your mails in plain text. Not all people are willing to 
bother with HTML mails.

On Thursday 16 September 2004 17:17, Luis F. V. Gomes wrote:
>  Hi
>
>  Our team trying to setup a mailserver cluster with heartbeat and DRBD to
> mount /var/spool/mail, but we are having a problem. 
>  If we choose any of the cluster members and simply pull the power plug out
> and later turn it on again, everything works fine. But if the primary is
> rebooted or halted using the normal UNIX commands, the secondary becomes
> primary but when the original primary server is up again, the current
> primary turns to cs:StandAlone forever logging the following messages:
>
>
>  drbd0: Current Primary shall become sync TARGET! Aborting to prevent data
> corruption drbd0: error receiving ReportParams, l: 72!

There was a thread about a similar problem some days ago. The suggestion then 
was that the network was shutdown before drbd. This leads to a split brain 
situation where both system have localy consistent data, but are still out of 
sync.

Andreas

>
>  It does not matter if auto_failback is on or off.
>  Is this supposed to happen? Why?
>
>
>  We are using:
>  Fedora Core 2, kernel 2.6.8-1.521
>  DRBD 0.7.3 and 0.7.4 (same behaviour)
>  Heartbeat 1.2.2
>
>  The logs, with auto_failback off:
>
>  The original secondary (SLAVE):
>  ===============================
>  # cat /proc/drbd (while the other is down)
>  version: 0.7.4 (api:76/proto:74)
>  SVN Revision: 1537M build by root at MASTER.ele.puc-rio.br, 2004-09-13
> 16:14:51 0: cs:WFConnection st:Primary/Unknown ld:Consistent
>      ns:0 nr:811008 dw:811012 dr:17 al:0 bm:55 lo:0 pe:0 ua:0 ap:0
>
>  # cat /proc/drbd (after both nodes up again)
>  version: 0.7.4 (api:76/proto:74)
>  SVN Revision: 1537M build by root at MASTER.ele.puc-rio.br, 2004-09-13
> 16:14:51 0: cs:StandAlone st:Primary/Unknown ld:Consistent
>      ns:0 nr:811008 dw:811012 dr:17 al:0 bm:55 lo:0 pe:0 ua:0 ap:0
>
>  # /var/log/messages:
>  Sep 13 18:12:27 SLAVE kernel: drbd0: drbd0_receiver [12147]: cstate
> WFConnection --> WFReportParams Sep 13 18:12:27 SLAVE kernel: drbd0:
> Handshake successful: DRBD Network Protocol version 74 Sep 13 18:12:27
> SLAVE kernel: drbd0: Connection established.
>  Sep 13 18:12:27 SLAVE kernel: drbd0: I am(P):
> 1:00000002:00000007:00000070:00000013:10 Sep 13 18:12:27 SLAVE kernel:
> drbd0: Peer(S): 1:00000002:00000007:00000072:00000012:10 Sep 13 18:12:27
> SLAVE kernel: drbd0: Current Primary shall become sync TARGET! Aborting to
> prevent data corruption. Sep 13 18:12:27 SLAVE kernel: drbd0:
> drbd0_receiver [12147]: cstate WFReportParams --> StandAlone Sep 13
> 18:12:27 SLAVE kernel: drbd0: error receiving ReportParams, l: 72! Sep 13
> 18:12:27 SLAVE kernel: drbd0: asender terminated
>  Sep 13 18:12:27 SLAVE kernel: drbd0: worker terminated
>  Sep 13 18:12:27 SLAVE kernel: drbd0: drbd0_receiver [12147]: cstate
> StandAlone --> StandAlone Sep 13 18:12:27 SLAVE kernel: drbd0: Connection
> lost.
>  Sep 13 18:12:27 SLAVE kernel: drbd0: receiver terminated
>  Sep 13 18:12:30 SLAVE heartbeat[12082]: info: Heartbeat restart on node
> MASTER.ele.puc-rio.br Sep 13 18:12:30 SLAVE heartbeat[12082]: info: Link
> MASTER.ele.puc-rio.br:eth1 up. Sep 13 18:12:30 SLAVE heartbeat[12082]:
> info: Status update for node MASTER.ele.puc-rio.br: status up Sep 13
> 18:12:30 SLAVE heartbeat[12082]: info: Status update for node
> MASTER.ele.puc-rio.br: status active Sep 13 18:12:30 SLAVE
> heartbeat[12082]: info: remote resource transition completed. Sep 13
> 18:12:30 SLAVE heartbeat: info: Running /usr/local/etc/ha.d/rc.d/status
> status
>
>
>  The original primary (MASTER):
>  ==============================
>  # cat /proc/drbd
>  version: 0.7.4 (api:76/proto:74)
>  SVN Revision: 1537M build by phil at nudl, 2004-09-09 19:53:07
>   0: cs:WFConnection st:Secondary/Unknown ld:Consistent
>      ns:0 nr:0 dw:0 dr:0 al:0 bm:198 lo:0 pe:0 ua:0 ap:0
>
>  # /var/log/messages:
>
>  Sep 13 18:12:27 MASTER kernel: drbd: initialised. Version: 0.7.4
> (api:76/proto:74) Sep 13 18:12:27 MASTER kernel: drbd: SVN Revision: 1537M
> build by phil at nudl, 2004-09-09 19:53:07 Sep 13 18:12:27 MASTER kernel:
> drbd: registered as block device major 147 Sep 13 18:12:27 MASTER kernel:
> drbd0: resync bitmap: bits=3113963 words=97312 Sep 13 18:12:27 MASTER
> kernel: drbd0: size = 11 GB (12455852 KB)
>  Sep 13 18:12:27 MASTER kernel: drbd0: 0 KB marked out-of-sync by on disk
> bit-map. Sep 13 18:12:27 MASTER kernel: drbd0: Found 6 transactions (324
> active extents) in activity log. Sep 13 18:12:27 MASTER kernel: drbd0:
> Marked additional 99 MB as out-of-sync based on AL. Sep 13 18:12:27 MASTER
> kernel: drbd0: drbdsetup [1964]: cstate Unconfigured --> StandAlone Sep 13
> 18:12:27 MASTER kernel: drbd0: drbdsetup [1966]: cstate StandAlone -->
> Unconnected Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]:
> cstate Unconnected --> WFConnection Sep 13 18:12:27 MASTER drbd: [r0].
>  Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
> WFConnection --> WFReportParams Sep 13 18:12:27 MASTER kernel: drbd0:
> Handshake successful: DRBD Network Protocol version 74 Sep 13 18:12:27
> MASTER kernel: drbd0: Connection established.
>  Sep 13 18:12:27 MASTER kernel: drbd0: I am(S):
> 1:00000002:00000007:00000072:00000012:10 Sep 13 18:12:27 MASTER kernel:
> drbd0: Peer(P): 1:00000002:00000007:00000070:00000013:10 Sep 13 18:12:27
> MASTER kernel: drbd0: drbd0_receiver [1967]: cstate WFReportParams -->
> WFBitMapS Sep 13 18:12:27 MASTER rc: Starting drbd:  succeeded
>  Sep 13 18:12:27 MASTER kernel: drbd0: sock_sendmsg returned -104
>  Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
> WFBitMapS --> BrokenPipe Sep 13 18:12:27 MASTER kernel: drbd0: short sent
> ReportBitMap size=4096 sent=3344 Sep 13 18:12:27 MASTER kernel: drbd0:
> Secondary/Unknown --> Secondary/Primary Sep 13 18:12:27 MASTER kernel:
> drbd0: meta connection shut down by peer. Sep 13 18:12:27 MASTER kernel:
> drbd0: asender terminated
>  Sep 13 18:12:27 MASTER kernel: drbd0: sock was shut down by peer
>  Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
> BrokenPipe --> BrokenPipe Sep 13 18:12:27 MASTER kernel: drbd0: short read
> expecting header on sock: r=0 Sep 13 18:12:27 MASTER kernel: drbd0: worker
> terminated
>  Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
> BrokenPipe --> Unconnected Sep 13 18:12:27 MASTER kernel: drbd0: Connection
> lost.
>  Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate
> Unconnected --> WFConnection
>
>
>
>
>   
>  Thanks for any help.
>  Sorry if this is a repost.
>
>  Luís
>   
>   
>  !                          ,,,
>  !                         (@ @)
> 
> +--------------------oOO-/-(*)-\-OOo---+----------------------------------+
>
>  | Luis Fernando V. Gomes               |   Email:  lf at ele.puc-rio.br
>  | Network Administrator                |
>  | Dept. Engenharia Eletrica            |
>  | PUC-Rio                              |   Voz: +(55) (21) 3114-1220
>  | R. Marques de Sao Vicente 225/401L   |   Fax: +(55) (21) 3114-1232
>  | 22453-900 - Rio de Janeiro/RJ, Brasil|
>
> 
> +--------------------------------------+----------------------------------+

-- 
Andreas Schultz

------------------ instant broadband access ------------------

Travelping GmbH | Leipziger Str. 46 | D-09113 Chemnitz | GERMANY
phone: +49-391-40045984 fax: +49-391-40045299 www.travelping.com
(local UK: 0844 4849180)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20040917/52ddca03/attachment.pgp>


More information about the drbd-user mailing list