Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wednesday 03 June 2009 12:40:23 Pedro Sousa wrote: > Hi, > > can you help me with this? I can't figure it out why it goes "StandAlone". If you read the log messages, you'll see the reason is quite obvious: ERROR: write_child: write failure on bcast eth1.: No such device You expect drbd to communicate over a not (yet) existing device. It can't so it outdates itself. Fix your init scripts! http://www.drbd.org/users-guide/s-resolve-split-brain.html > Regards, > Pedro Sousa > > On Thu, May 28, 2009 at 6:49 PM, Pedro Sousa <pgsousa at gmail.com> wrote: > > Can you check it please? > > > > May 27 19:38:35 ha2 heartbeat: [2426]: ERROR: glib: Unable to send bcast > > [-1] packet(len=217): No such device > > May 27 19:38:35 ha2 heartbeat: [2426]: ERROR: write_child: write failure > > on bcast eth1.: No such device > > May 27 19:38:37 ha2 heartbeat: [2426]: ERROR: glib: Unable to send bcast > > [-1] packet(len=217): No such device > > May 27 19:38:37 ha2 heartbeat: [2426]: ERROR: write_child: write failure > > on bcast eth1.: No such device > > May 27 19:38:38 ha2 kernel: drbd0: PingAck did not arrive in time. > > May 27 19:38:38 ha2 kernel: drbd0: peer( Primary -> Unknown ) conn( > > Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) > > May 27 19:38:38 ha2 kernel: drbd0: asender terminated > > May 27 19:38:38 ha2 kernel: drbd0: Terminating asender thread > > May 27 19:38:38 ha2 kernel: drbd0: short read expecting header on sock: > > r=-512 > > May 27 19:38:38 ha2 kernel: drbd0: Writing meta data super block now. > > May 27 19:38:38 ha2 kernel: drbd0: tl_clear() > > May 27 19:38:38 ha2 kernel: drbd0: Connection closed > > May 27 19:38:38 ha2 kernel: drbd0: conn( NetworkFailure -> Unconnected ) > > May 27 19:38:38 ha2 kernel: drbd0: receiver terminated > > May 27 19:38:38 ha2 kernel: drbd0: receiver (re)started > > May 27 19:38:38 ha2 kernel: drbd0: conn( Unconnected -> WFConnection ) > > May 27 19:38:38 ha2 kernel: drbd0: Unable to bind source sock (-99) > > May 27 19:38:38 ha2 last message repeated 2 times > > May 27 19:38:38 ha2 kernel: drbd0: Unable to bind sock2 (-99) > > May 27 19:38:38 ha2 kernel: drbd0: conn( WFConnection -> Disconnecting ) > > May 27 19:38:38 ha2 kernel: drbd0: Discarding network configuration. > > May 27 19:38:38 ha2 kernel: drbd0: tl_clear() > > May 27 19:38:38 ha2 kernel: drbd0: Connection closed > > May 27 19:38:38 ha2 kernel: drbd0: conn( Disconnecting -> StandAlone ) > > May 27 19:38:38 ha2 kernel: drbd0: receiver terminated > > May 27 19:38:38 ha2 kernel: drbd0: Terminating receiver thread > > May 27 19:38:39 ha2 heartbeat: [2426]: ERROR: glib: Unable to send bcast > > [-1] packet(len=217): No such device > > May 27 19:38:39 ha2 heartbeat: [2426]: ERROR: write_child: write failure > > on bcast eth1.: No such device > > May 27 19:38:40 ha2 kernel: drbd0: disk( UpToDate -> Outdated ) > > May 27 19:38:40 ha2 kernel: drbd0: Writing meta data super block now. > > May 27 19:38:40 ha2 /usr/lib/heartbeat/dopd: [2513]: info: sending return > > code: 4, ha2.teste.local -> ha1.teste.local > > May 27 19:38:41 ha2 heartbeat: [2426]: ERROR: glib: Unable to send bcast > > [-1] packet(len=310): No such device > > May 27 19:38:41 ha2 heartbeat: [2426]: ERROR: write_child: write failure > > on bcast eth1.: No such device > > May 27 19:38:41 ha2 heartbeat: [2426]: ERROR: glib: Unable to send bcast > > [-1] packet(len=217): No such device > > May 27 19:38:41 ha2 heartbeat: [2426]: ERROR: write_child: write failure > > on bcast eth1.: No such device > > May 27 19:38:43 ha2 heartbeat: [2426]: ERROR: glib: Unable to send bcast > > [-1] packet(len=217): No such device > > May 27 19:38:43 ha2 heartbeat: [2426]: ERROR: write_child: write failure > > on bcast eth1.: No such device > > May 27 19:38:45 ha2 heartbeat: [2408]: info: Link ha1.teste.local:eth1 > > dead. > > May 27 19:38:45 ha2 ipfail: [2514]: info: Link Status update: Link > > ha1.teste.local/eth1 now has status dead > > May 27 19:38:45 ha2 heartbeat: [2426]: ERROR: glib: Unable to send bcast > > [-1] packet(len=217): No such device > > May 27 19:38:45 ha2 heartbeat: [2426]: ERROR: write_child: write failure > > on bcast eth1.: No such device > > May 27 19:38:46 ha2 ipfail: [2514]: info: Asking other side for ping node > > count. > > May 27 19:38:46 ha2 ipfail: [2514]: info: Checking remote count of ping > > nodes. > > May 27 19:38:46 ha2 heartbeat: [2426]: ERROR: glib: Unable to send bcast > > [-1] packet(len=223): No such device > > May 27 19:38:46 ha2 heartbeat: [2426]: ERROR: write_child: write failure > > on bcast eth1.: No such device > > May 27 19:38:46 ha2 heartbeat: [2426]: WARN: Temporarily Suppressing > > write error messages > > May 27 19:38:46 ha2 heartbeat: [2426]: WARN: Is a cable unplugged on > > bcast eth1? > > May 27 19:38:47 ha2 ipfail: [2514]: info: Ping node count is balanced. > > May 27 19:38:48 ha2 ipfail: [2514]: info: No giveup timer to abort. > > May 27 19:39:06 ha2 kernel: eth1: link up > > > > Regards, > > Pedro Sousa > > > > > > > > > > On Thu, May 28, 2009 at 4:51 PM, Lars Ellenberg > > <lars.ellenberg at linbit.com > > > > > wrote: > >> > >> On Thu, May 28, 2009 at 01:46:43PM +0100, Pedro Sousa wrote: > >> > Hi, > >> > > >> > I'm testing split-brain in a master/slave scenario with dopd and have > >> > >> some > >> > >> > doubts about the automatic recovery procedure. The steps I took were: > >> > > >> > 1º Unplug the crossover cable > >> > > >> > Master: > >> > > >> > Primary/Unknown ds:UpToDate/Outdated > >> > > >> > Slave: > >> > > >> > StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown > >> > > >> > 2º Plug the cable back on: > >> > > >> > Both nodes remain with the same state: Update/Outdated and > >> > Consistent/Unknown > >> > > >> > My question is: shouldn't the slave rejoin/resync to the master > >> > automatically after I plug the cable? > >> > > >> > I have to manually run: "drbdadm adjust all" to recover it. > >> > >> once a node reaches "StandAlone", > >> you have to tell it to try and reconnect, yes. > >> > >> so this is how it is supposed to be. > >> > >> why it goes to "StandAlone" should be in the logs. > >> > >> > My conf (centos 5.3; drbd 8.3.1; heartbeat 2.99) > >> > > >> > /etc/drbd.conf > >> > >> </snip> > >> > >> > >> -- > >> > >> : Lars Ellenberg > >> : LINBIT | Your Way to High Availability > >> : DRBD/HA support and consulting http://www.linbit.com > >> > >> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > >> __ > >> please don't Cc me, but send to list -- I'm subscribed > >> _______________________________________________ > >> drbd-user mailing list > >> drbd-user at lists.linbit.com > >> http://lists.linbit.com/mailman/listinfo/drbd-user -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090603/66496266/attachment.pgp>