[DRBD-user] DRBD+Heartbeat: Secondary node Refusing to be Primary

Thu Jul 15 08:31:28 CEST 2010

Hello All,

As I have already told you in my previous mail that my drbd was not 
working  with Heartbeat. Now after making some changes in the 
configuration files It's running on the primary master automatically 
with heartbeat .
Thanks to u guys for that...!

But when primary node goes down then secondary node refuses to be 
primary and consequences drive is not mounted automatically to the mount 
point on secondary node.

my message log in /var/log on secondary node when Primary node goes 
down(I manually detach the network cables for testing) are:

*/var/log/messages:*

Jul 15 10:55:03 sec kernel: block drbd1: PingAck did not arrive in time.
Jul 15 10:55:03 sec kernel: block drbd1: peer( Primary -> Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) 
Jul 15 10:55:03 sec kernel: block drbd1: asender terminated
Jul 15 10:55:03 sec kernel: block drbd1: short read expecting header on sock: r=-512
Jul 15 10:55:03 sec kernel: block drbd1: Terminating asender thread
Jul 15 10:55:03 sec kernel: block drbd1: Connection closed
Jul 15 10:55:03 sec kernel: block drbd1: conn( NetworkFailure -> Unconnected ) 
Jul 15 10:55:03 sec kernel: block drbd1: receiver terminated
Jul 15 10:55:03 sec kernel: block drbd1: Restarting receiver thread
Jul 15 10:55:03 sec kernel: block drbd1: receiver (re)started
Jul 15 10:55:03 sec kernel: block drbd1: conn( Unconnected -> WFConnection ) 

Jul 15 10:55:09 sec kernel: block drbd1: State change failed: Refusing to be Primary without at least one UpToDate disk
Jul 15 10:55:09 sec kernel: block drbd1:   state = { cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
Jul 15 10:55:09 sec kernel: block drbd1:  wanted = { cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown r--- }
Jul 15 10:55:10 sec kernel: block drbd1: State change failed: Refusing to be Primary without at least one UpToDate disk
Jul 15 10:55:10 sec kernel: block drbd1:   state = { cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
Jul 15 10:55:10 sec kernel: block drbd1:  wanted = { cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown r--- }
Jul 15 10:55:11 sec kernel: block drbd1: State change failed: Refusing to be Primary without at least one UpToDate disk
Jul 15 10:55:11 sec kernel: block drbd1:   state = { cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
Jul 15 10:55:11 sec kernel: block drbd1:  wanted = { cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown r--- }

*var/log/ha-debug file is:*

Jul 15 10:04:30 sec.master heartbeat: [4559]: info: Link test.cluster:eth0 dead.
Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Heartbeat restart on node test.cluster
Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Link test.cluster:eth0 up.
Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Status update for node test.cluster: status init
Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Status update for node test.cluster: status up
Jul 15 10:04:32 sec.master heartbeat: [6281]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jul 15 10:04:32 sec.master heartbeat: [4559]: debug: StartNextRemoteRscReq(): child count 1
Jul 15 10:04:32 sec.master heartbeat: [4559]: debug: get_delnodelist: delnodelist= 
logd is not running2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status
harc[6281]:	2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status
Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Status update for node test.cluster: status active
Jul 15 10:04:32 sec.master heartbeat: [4559]: debug: StartNextRemoteRscReq(): child count 1
Jul 15 10:04:32 sec.master heartbeat: [6301]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
logd is not running2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status
harc[6301]:	2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status
Jul 15 10:04:32 sec.master heartbeat: [6321]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
logd is not running2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status
harc[6321]:	2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status
Jul 15 10:04:33 sec.master heartbeat: [4559]: info: remote resource transition completed.
Jul 15 10:04:33 sec.master heartbeat: [4559]: info: sec.master wants to go standby [foreign]
Jul 15 10:04:34 sec.master heartbeat: [4559]: info: standby: test.cluster can take our foreign resources
Jul 15 10:04:34 sec.master heartbeat: [6341]: info: give up foreign HA resources (standby).
logd is not running2010/07/15_10:04:34 info: Releasing resource group: test.cluster IPaddr::192.168.0.50
ResourceManager[6356]:	2010/07/15_10:04:34 info: Releasing resource group: test.cluster IPaddr::192.168.0.50
logd is not running2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.50 stop
ResourceManager[6356]:	2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.50 stop
In IP Stop
SIOCDELRT: No such process
logd is not running2010/07/15_10:04:34 INFO: ifconfig eth0:0 down
IPaddr[6414]:	2010/07/15_10:04:34 INFO: ifconfig eth0:0 down
logd is not running2010/07/15_10:04:34 INFO:  Success
IPaddr[6393]:	2010/07/15_10:04:34 INFO:  Success
INFO:  Success
logd is not running2010/07/15_10:04:34 info: Releasing resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3
ResourceManager[6445]:	2010/07/15_10:04:34 info: Releasing resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3
logd is not running2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /replicated ext3 stop
ResourceManager[6445]:	2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /replicated ext3 stop
logd is not running2010/07/15_10:04:34 INFO: Running stop for /dev/drbd1 on /replicated
Filesystem[6502]:	2010/07/15_10:04:34 INFO: Running stop for /dev/drbd1 on /replicated
logd is not running2010/07/15_10:04:34 INFO: Trying to unmount /replicated
Filesystem[6502]:	2010/07/15_10:04:34 INFO: Trying to unmount /replicated
logd is not running2010/07/15_10:04:34 INFO: unmounted /replicated successfully
Filesystem[6502]:	2010/07/15_10:04:34 INFO: unmounted /replicated successfully
logd is not running2010/07/15_10:04:34 INFO:  Success
Filesystem[6484]:	2010/07/15_10:04:34 INFO:  Success
INFO:  Success
logd is not running2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/drbddisk drbd1 stop
ResourceManager[6445]:	2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/drbddisk drbd1 stop
Jul 15 10:04:34 sec.master heartbeat: [6341]: info: foreign HA resource release completed (standby).
Jul 15 10:04:34 sec.master heartbeat: [4559]: info: Local standby process completed [foreign].
Jul 15 10:04:35 sec.master heartbeat: [4559]: WARN: 1 lost packet(s) for [test.cluster] [13:15]
Jul 15 10:04:35 sec.master heartbeat: [4559]: info: remote resource transition completed.
Jul 15 10:04:35 sec.master heartbeat: [4559]: info: No pkts missing from test.cluster!
Jul 15 10:04:35 sec.master heartbeat: [4559]: info: Other node completed standby takeover of foreign resources.
Jul 15 10:55:08 sec.master heartbeat: [4559]: WARN: node test.cluster: is dead
Jul 15 10:55:08 sec.master heartbeat: [4559]: WARN: No STONITH device configured.
Jul 15 10:55:08 sec.master heartbeat: [4559]: WARN: Shared disks are not protected.
Jul 15 10:55:08 sec.master heartbeat: [4559]: info: Resources being acquired from test.cluster.
Jul 15 10:55:08 sec.master heartbeat: [7293]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jul 15 10:55:08 sec.master heartbeat: [4559]: info: Link test.cluster:eth0 dead.
Jul 15 10:55:08 sec.master heartbeat: [7294]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys sec.master] to acquire.
Jul 15 10:55:08 sec.master heartbeat: [4559]: debug: StartNextRemoteRscReq(): child count 1
*logd is not running2010/07/15_10:55:08 info: Running /etc/ha.d/rc.d/status status
harc[7293]:	2010/07/15_10:55:08 info: Running /etc/ha.d/rc.d/status status
logd is not running2010/07/15_10:55:08 info: Taking over resource group IPaddr::192.168.0.50
mach_down[7328]:	2010/07/15_10:55:08 info: Taking over resource group IPaddr::192.168.0.50
logd is not running2010/07/15_10:55:08 info: Acquiring resource group: test.cluster IPaddr::192.168.0.50
ResourceManager[7358]:	2010/07/15_10:55:08 info: Acquiring resource group: test.cluster IPaddr::192.168.0.50
logd is not running2010/07/15_10:55:08 INFO:  Resource is stopped
IPaddr[7387]:	2010/07/15_10:55:08 INFO:  Resource is stopped
logd is not running2010/07/15_10:55:08 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.50 start*
ResourceManager[7358]:	2010/07/15_10:55:08 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.50 start
logd is not running2010/07/15_10:55:09 INFO: Using calculated nic for 192.168.0.50: eth0
IPaddr[7470]:	2010/07/15_10:55:09 INFO: Using calculated nic for 192.168.0.50: eth0
logd is not running2010/07/15_10:55:09 INFO: Using calculated netmask for 192.168.0.50: 255.255.255.0
IPaddr[7470]:	2010/07/15_10:55:09 INFO: Using calculated netmask for 192.168.0.50: 255.255.255.0
logd is not running2010/07/15_10:55:09 INFO: eval ifconfig eth0:0 192.168.0.50 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[7470]:	2010/07/15_10:55:09 INFO: eval ifconfig eth0:0 192.168.0.50 netmask 255.255.255.0 broadcast 192.168.0.255
logd is not running2010/07/15_10:55:09 INFO:  Success
IPaddr[7449]:	2010/07/15_10:55:09 INFO:  Success
INFO:  Success
*logd is not running2010/07/15_10:55:09 info: Taking over resource group drbddisk::drbd1
mach_down[7328]:	2010/07/15_10:55:09 info: Taking over resource group drbddisk::drbd1
logd is not running2010/07/15_10:55:09 info: Acquiring resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3
ResourceManager[7570]:	2010/07/15_10:55:09 info: Acquiring resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3
logd is not running2010/07/15_10:55:09 info: Running /etc/ha.d/resource.d/drbddisk drbd1 start
ResourceManager[7570]:	2010/07/15_10:55:09 info: Running /etc/ha.d/resource.d/drbddisk drbd1 start
1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 1 primary' terminated with exit code 17*
1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 1 primary' terminated with exit code 17
1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 1 primary' terminated with exit code 17
1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 1 primary' terminated with exit code 17
1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 1 primary' terminated with exit code 17
1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk
Command '/sbin/drbdsetup 1 primary' terminated with exit code 17
logd is not running2010/07/15_10:55:14 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
ResourceManager[7570]:	2010/07/15_10:55:14 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
logd is not running2010/07/15_10:55:14 CRIT: Giving up resources due to failure of drbddisk::drbd1
ResourceManager[7570]:	2010/07/15_10:55:14 CRIT: Giving up resources due to failure of drbddisk::drbd1
logd is not running2010/07/15_10:55:14 info: Releasing resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3
ResourceManager[7570]:	2010/07/15_10:55:14 info: Releasing resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3
logd is not running2010/07/15_10:55:14 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /replicated ext3 stop
ResourceManager[7570]:	2010/07/15_10:55:14 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /replicated ext3 stop
logd is not running2010/07/15_10:55:14 INFO: Running stop for /dev/drbd1 on /replicated
Filesystem[7694]:	2010/07/15_10:55:14 INFO: Running stop for /dev/drbd1 on /replicated
/dev/drbd1: Wrong medium type
logd is not running2010/07/15_10:55:14 INFO:  Success
Filesystem[7679]:	2010/07/15_10:55:14 INFO:  Success
INFO:  Success
logd is not running2010/07/15_10:55:14 info: Running /etc/ha.d/resource.d/drbddisk drbd1 stop
ResourceManager[7570]:	2010/07/15_10:55:14 info: Running /etc/ha.d/resource.d/drbddisk drbd1 stop
logd is not running2010/07/15_10:55:14 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[7328]:	2010/07/15_10:55:14 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
logd is not running2010/07/15_10:55:14 info: mach_down takeover complete for node test.cluster.
mach_down[7328]:	2010/07/15_10:55:14 info: mach_down takeover complete for node test.cluster.
Jul 15 10:55:14 sec.master heartbeat: [4559]: info: mach_down takeover complete.
ARPING 192.168.0.50 from 192.168.0.50 eth0
Sent 10 probes (10 broadcast(s))
Received 0 response(s)
logd is not running2010/07/15_10:55:19 ERROR: Could not send gratuitous arps. rc=1
IPaddr[7470]:	2010/07/15_10:55:19 ERROR: Could not send gratuitous arps. rc=1
logd is not running2010/07/15_10:55:44 Going standby [foreign].
hb_standby[7820]:	2010/07/15_10:55:44 Going standby [foreign].
Jul 15 10:55:44 sec.master heartbeat: [4559]: info: sec.master wants to go standby [foreign]
Jul 15 10:55:55 sec.master heartbeat: [4559]: WARN: No reply to standby request.  Standby request cancelled.

*my /etc/ha.d/haresource file is:*

test.cluster IPaddr::192.168.0.50
test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3

*where test.cluster is my primary node and 192.168.0.50 is virtual IP on 
which heartbeat is running

*my /etc/drbd.conf file is:*

global {

  usage-count yes;

}

common {

  protocol C;

}

resource drbd1 {

  on test.cluster {

    device    /dev/drbd1;

    disk      /dev/sda5;

    address   192.168.0.148:7789;

    meta-disk internal;

  }

  on sec.master {

    device    /dev/drbd1;

    disk      /dev/sda5;

    address   192.168.0.190:7789;

    meta-disk internal;

  }

}

I just want to run my drbd with heartbeat and replicate the /dev/drbd1 
and run some service on it immediately in the case of  fail-over. So 
Please help me as soon as possible.

Thanks
Deepak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100715/28eefbc6/attachment.htm>