Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Jan 08, 2008 at 02:50:42PM +0530, Ashish Shukla wrote: > Hi, > > I'm having a 2-node cluster, running DRBD 8.0.6 on CentOS 5 (amd64) > with Heartbeat 2.1.2. A gigabit interface is dedicated for DRBD on > both servers. Both server's interfaces (used by DRBD) are connected > via a cross-cable. > > Today morning, when I started the servers, the primary server "A" is > not started. So, the secondary server "B" took over the resources, > became the DRBD primary node, and started serving clients. After few > hours, after fixing problem in server "A", when I started server "A" > again, it is not able to connect to DRBD running on server "B". I > restarted server "A" again, but same problem. I checked server "B" and > all my DRBD filesystems were mounted fine, and all of my DRBD > resources are in "Primary/Unknown" state. I tried to telnet to DRBD > ports of my server "B", but I get "Connection refused". So, to fix > this, I unmounted my DRBD filesystems, restarted DRBD and now DRBD > started listening on ports. And now due to this, split brain occured, > and I need to resync server "A" with server "B". > > I've pasted the output of "cat /var/log/messages |fgrep kernel" on > server "B" at "http://pastebin.ca/raw/846436". The DRBD is using > interface "eth1". > > Can anyone figure out from the above stuff, what could've caused DRBD > to stop listening on its TCP ports, hmm...? Is it due to change in > status of "eth1" interface, hmm...? first handshake in that log: Jan 8 10:01:18 srv1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 86 Jan 8 10:01:18 srv1 kernel: drbd0: Split-Brain detected, dropping connection! there. "dropping connection". what ever you did to get into this split brain, according to that log that is the cause for drbd to stop listening. Jan 8 10:01:18 srv1 kernel: drbd0: self 12B43BF90A12816E:E9380EB721B56A7A:F5173C8D8570CF38:FF9FA0E2DF43EBCB Jan 8 10:01:18 srv1 kernel: drbd0: peer 2BE4D108084C8AFC:E9380EB721B56A7B:F5173C8D8570CF38:FF9FA0E2DF43EBCB Jan 8 10:01:18 srv1 kernel: drbd0: conn( WFReportParams -> Disconnecting ) Jan 8 10:01:18 srv1 kernel: drbd0: error receiving ReportState, l: 4! Jan 8 10:01:18 srv1 kernel: drbd0: asender terminated Jan 8 10:01:18 srv1 kernel: drbd0: tl_clear() Jan 8 10:01:18 srv1 kernel: drbd0: Connection closed Jan 8 10:01:18 srv1 kernel: drbd0: conn( Disconnecting -> StandAlone ) Jan 8 10:01:18 srv1 kernel: drbd0: receiver terminated -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please use the "List-Reply" function of your email client.