Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Luis, First, please send your mails in plain text. Not all people are willing to bother with HTML mails. On Thursday 16 September 2004 17:17, Luis F. V. Gomes wrote: > Hi > > Our team trying to setup a mailserver cluster with heartbeat and DRBD to > mount /var/spool/mail, but we are having a problem. > If we choose any of the cluster members and simply pull the power plug out > and later turn it on again, everything works fine. But if the primary is > rebooted or halted using the normal UNIX commands, the secondary becomes > primary but when the original primary server is up again, the current > primary turns to cs:StandAlone forever logging the following messages: > > > drbd0: Current Primary shall become sync TARGET! Aborting to prevent data > corruption drbd0: error receiving ReportParams, l: 72! There was a thread about a similar problem some days ago. The suggestion then was that the network was shutdown before drbd. This leads to a split brain situation where both system have localy consistent data, but are still out of sync. Andreas > > It does not matter if auto_failback is on or off. > Is this supposed to happen? Why? > > > We are using: > Fedora Core 2, kernel 2.6.8-1.521 > DRBD 0.7.3 and 0.7.4 (same behaviour) > Heartbeat 1.2.2 > > The logs, with auto_failback off: > > The original secondary (SLAVE): > =============================== > # cat /proc/drbd (while the other is down) > version: 0.7.4 (api:76/proto:74) > SVN Revision: 1537M build by root at MASTER.ele.puc-rio.br, 2004-09-13 > 16:14:51 0: cs:WFConnection st:Primary/Unknown ld:Consistent > ns:0 nr:811008 dw:811012 dr:17 al:0 bm:55 lo:0 pe:0 ua:0 ap:0 > > # cat /proc/drbd (after both nodes up again) > version: 0.7.4 (api:76/proto:74) > SVN Revision: 1537M build by root at MASTER.ele.puc-rio.br, 2004-09-13 > 16:14:51 0: cs:StandAlone st:Primary/Unknown ld:Consistent > ns:0 nr:811008 dw:811012 dr:17 al:0 bm:55 lo:0 pe:0 ua:0 ap:0 > > # /var/log/messages: > Sep 13 18:12:27 SLAVE kernel: drbd0: drbd0_receiver [12147]: cstate > WFConnection --> WFReportParams Sep 13 18:12:27 SLAVE kernel: drbd0: > Handshake successful: DRBD Network Protocol version 74 Sep 13 18:12:27 > SLAVE kernel: drbd0: Connection established. > Sep 13 18:12:27 SLAVE kernel: drbd0: I am(P): > 1:00000002:00000007:00000070:00000013:10 Sep 13 18:12:27 SLAVE kernel: > drbd0: Peer(S): 1:00000002:00000007:00000072:00000012:10 Sep 13 18:12:27 > SLAVE kernel: drbd0: Current Primary shall become sync TARGET! Aborting to > prevent data corruption. Sep 13 18:12:27 SLAVE kernel: drbd0: > drbd0_receiver [12147]: cstate WFReportParams --> StandAlone Sep 13 > 18:12:27 SLAVE kernel: drbd0: error receiving ReportParams, l: 72! Sep 13 > 18:12:27 SLAVE kernel: drbd0: asender terminated > Sep 13 18:12:27 SLAVE kernel: drbd0: worker terminated > Sep 13 18:12:27 SLAVE kernel: drbd0: drbd0_receiver [12147]: cstate > StandAlone --> StandAlone Sep 13 18:12:27 SLAVE kernel: drbd0: Connection > lost. > Sep 13 18:12:27 SLAVE kernel: drbd0: receiver terminated > Sep 13 18:12:30 SLAVE heartbeat[12082]: info: Heartbeat restart on node > MASTER.ele.puc-rio.br Sep 13 18:12:30 SLAVE heartbeat[12082]: info: Link > MASTER.ele.puc-rio.br:eth1 up. Sep 13 18:12:30 SLAVE heartbeat[12082]: > info: Status update for node MASTER.ele.puc-rio.br: status up Sep 13 > 18:12:30 SLAVE heartbeat[12082]: info: Status update for node > MASTER.ele.puc-rio.br: status active Sep 13 18:12:30 SLAVE > heartbeat[12082]: info: remote resource transition completed. Sep 13 > 18:12:30 SLAVE heartbeat: info: Running /usr/local/etc/ha.d/rc.d/status > status > > > The original primary (MASTER): > ============================== > # cat /proc/drbd > version: 0.7.4 (api:76/proto:74) > SVN Revision: 1537M build by phil at nudl, 2004-09-09 19:53:07 > 0: cs:WFConnection st:Secondary/Unknown ld:Consistent > ns:0 nr:0 dw:0 dr:0 al:0 bm:198 lo:0 pe:0 ua:0 ap:0 > > # /var/log/messages: > > Sep 13 18:12:27 MASTER kernel: drbd: initialised. Version: 0.7.4 > (api:76/proto:74) Sep 13 18:12:27 MASTER kernel: drbd: SVN Revision: 1537M > build by phil at nudl, 2004-09-09 19:53:07 Sep 13 18:12:27 MASTER kernel: > drbd: registered as block device major 147 Sep 13 18:12:27 MASTER kernel: > drbd0: resync bitmap: bits=3113963 words=97312 Sep 13 18:12:27 MASTER > kernel: drbd0: size = 11 GB (12455852 KB) > Sep 13 18:12:27 MASTER kernel: drbd0: 0 KB marked out-of-sync by on disk > bit-map. Sep 13 18:12:27 MASTER kernel: drbd0: Found 6 transactions (324 > active extents) in activity log. Sep 13 18:12:27 MASTER kernel: drbd0: > Marked additional 99 MB as out-of-sync based on AL. Sep 13 18:12:27 MASTER > kernel: drbd0: drbdsetup [1964]: cstate Unconfigured --> StandAlone Sep 13 > 18:12:27 MASTER kernel: drbd0: drbdsetup [1966]: cstate StandAlone --> > Unconnected Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: > cstate Unconnected --> WFConnection Sep 13 18:12:27 MASTER drbd: [r0]. > Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate > WFConnection --> WFReportParams Sep 13 18:12:27 MASTER kernel: drbd0: > Handshake successful: DRBD Network Protocol version 74 Sep 13 18:12:27 > MASTER kernel: drbd0: Connection established. > Sep 13 18:12:27 MASTER kernel: drbd0: I am(S): > 1:00000002:00000007:00000072:00000012:10 Sep 13 18:12:27 MASTER kernel: > drbd0: Peer(P): 1:00000002:00000007:00000070:00000013:10 Sep 13 18:12:27 > MASTER kernel: drbd0: drbd0_receiver [1967]: cstate WFReportParams --> > WFBitMapS Sep 13 18:12:27 MASTER rc: Starting drbd: succeeded > Sep 13 18:12:27 MASTER kernel: drbd0: sock_sendmsg returned -104 > Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate > WFBitMapS --> BrokenPipe Sep 13 18:12:27 MASTER kernel: drbd0: short sent > ReportBitMap size=4096 sent=3344 Sep 13 18:12:27 MASTER kernel: drbd0: > Secondary/Unknown --> Secondary/Primary Sep 13 18:12:27 MASTER kernel: > drbd0: meta connection shut down by peer. Sep 13 18:12:27 MASTER kernel: > drbd0: asender terminated > Sep 13 18:12:27 MASTER kernel: drbd0: sock was shut down by peer > Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate > BrokenPipe --> BrokenPipe Sep 13 18:12:27 MASTER kernel: drbd0: short read > expecting header on sock: r=0 Sep 13 18:12:27 MASTER kernel: drbd0: worker > terminated > Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate > BrokenPipe --> Unconnected Sep 13 18:12:27 MASTER kernel: drbd0: Connection > lost. > Sep 13 18:12:27 MASTER kernel: drbd0: drbd0_receiver [1967]: cstate > Unconnected --> WFConnection > > > > > > Thanks for any help. > Sorry if this is a repost. > > Luís > > > ! ,,, > ! (@ @) > > +--------------------oOO-/-(*)-\-OOo---+----------------------------------+ > > | Luis Fernando V. Gomes | Email: lf at ele.puc-rio.br > | Network Administrator | > | Dept. Engenharia Eletrica | > | PUC-Rio | Voz: +(55) (21) 3114-1220 > | R. Marques de Sao Vicente 225/401L | Fax: +(55) (21) 3114-1232 > | 22453-900 - Rio de Janeiro/RJ, Brasil| > > > +--------------------------------------+----------------------------------+ -- Andreas Schultz ------------------ instant broadband access ------------------ Travelping GmbH | Leipziger Str. 46 | D-09113 Chemnitz | GERMANY phone: +49-391-40045984 fax: +49-391-40045299 www.travelping.com (local UK: 0844 4849180) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20040917/52ddca03/attachment.pgp>