Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Newbie question here: so I created a mysql+drdb test setup with two nodes (vms), db1 and db2. root at db1:~# drbdadm role r0 Primary/Secondary root at db1:~# root at db2:~# drbdadm role r0 Secondary/Primary root at db2:~# One of the things I would like to see is how it behaves during a failure and recovery of the primary node. So, let's try a minor issue: I pull the ethernet cable on db1: root at db2:~# tail /var/log/kern.log Mar 7 18:24:19 db2 kernel: [33722.071609] block drbd0: PingAck did not arrive in time. Mar 7 18:24:19 db2 kernel: [33722.081223] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Mar 7 18:24:19 db2 kernel: [33722.081249] block drbd0: asender terminated Mar 7 18:24:19 db2 kernel: [33722.081256] block drbd0: Terminating asender thread Mar 7 18:24:19 db2 kernel: [33722.081492] block drbd0: short read expecting header on sock: r=-512 Mar 7 18:24:19 db2 kernel: [33722.096824] block drbd0: Connection closed Mar 7 18:24:19 db2 kernel: [33722.096836] block drbd0: conn( NetworkFailure -> Unconnected ) Mar 7 18:24:19 db2 kernel: [33722.096851] block drbd0: receiver terminated Mar 7 18:24:19 db2 kernel: [33722.096857] block drbd0: Restarting receiver thread Mar 7 18:24:19 db2 kernel: [33722.096862] block drbd0: receiver (re)started Mar 7 18:24:19 db2 kernel: [33722.096871] block drbd0: conn( Unconnected -> WFConnection ) As mentioned in http://www.drbd.org/users-guide/s-node-failure.html, db2 did not become primary by itself. root at db2:~# drbdadm role r0 Secondary/Unknown root at db2:~# !cat cat /proc/drbd version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at db2, 2011-03-07 10:22:02 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r---- ns:32 nr:79448 dw:364538 dr:270864 al:8 bm:49 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 root at db2:~# And then reconnect the ethernet cable on db1, which still thinks it is the primary node. root at db1:~# tail /var/log/kern.log Mar 7 18:24:19 db1 kernel: [ 744.695406] block drbd0: receiver (re)started Mar 7 18:24:19 db1 kernel: [ 744.695414] block drbd0: conn( Unconnected -> WFConnection ) Mar 7 18:24:19 db1 kernel: [ 744.696261] block drbd0: bind before connect failed, err = -99 Mar 7 18:24:19 db1 kernel: [ 744.696271] block drbd0: conn( WFConnection -> Disconnecting ) Mar 7 18:24:19 db1 kernel: [ 744.696345] block drbd0: Discarding network configuration. Mar 7 18:24:19 db1 kernel: [ 744.696702] block drbd0: Connection closed Mar 7 18:24:19 db1 kernel: [ 744.696719] block drbd0: conn( Disconnecting -> StandAlone ) Mar 7 18:24:19 db1 kernel: [ 744.697261] block drbd0: receiver terminated Mar 7 18:24:19 db1 kernel: [ 744.697280] block drbd0: Terminating receiver thread Mar 7 18:31:41 db1 kernel: [ 1186.440928] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX root at db1:~# Shouldn't the two nodes re-establish connectivity?