Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello All, As I have already told you in my previous mail that my drbd was not working with Heartbeat. Now after making some changes in the configuration files It's running on the primary master automatically with heartbeat . Thanks to u guys for that...! But when primary node goes down then secondary node refuses to be primary and consequences drive is not mounted automatically to the mount point on secondary node. my message log in /var/log on secondary node when Primary node goes down(I manually detach the network cables for testing) are: */var/log/messages:* Jul 15 10:55:03 sec kernel: block drbd1: PingAck did not arrive in time. Jul 15 10:55:03 sec kernel: block drbd1: peer( Primary -> Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jul 15 10:55:03 sec kernel: block drbd1: asender terminated Jul 15 10:55:03 sec kernel: block drbd1: short read expecting header on sock: r=-512 Jul 15 10:55:03 sec kernel: block drbd1: Terminating asender thread Jul 15 10:55:03 sec kernel: block drbd1: Connection closed Jul 15 10:55:03 sec kernel: block drbd1: conn( NetworkFailure -> Unconnected ) Jul 15 10:55:03 sec kernel: block drbd1: receiver terminated Jul 15 10:55:03 sec kernel: block drbd1: Restarting receiver thread Jul 15 10:55:03 sec kernel: block drbd1: receiver (re)started Jul 15 10:55:03 sec kernel: block drbd1: conn( Unconnected -> WFConnection ) Jul 15 10:55:09 sec kernel: block drbd1: State change failed: Refusing to be Primary without at least one UpToDate disk Jul 15 10:55:09 sec kernel: block drbd1: state = { cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- } Jul 15 10:55:09 sec kernel: block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown r--- } Jul 15 10:55:10 sec kernel: block drbd1: State change failed: Refusing to be Primary without at least one UpToDate disk Jul 15 10:55:10 sec kernel: block drbd1: state = { cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- } Jul 15 10:55:10 sec kernel: block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown r--- } Jul 15 10:55:11 sec kernel: block drbd1: State change failed: Refusing to be Primary without at least one UpToDate disk Jul 15 10:55:11 sec kernel: block drbd1: state = { cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- } Jul 15 10:55:11 sec kernel: block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown r--- } *var/log/ha-debug file is:* Jul 15 10:04:30 sec.master heartbeat: [4559]: info: Link test.cluster:eth0 dead. Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Heartbeat restart on node test.cluster Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Link test.cluster:eth0 up. Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Status update for node test.cluster: status init Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Status update for node test.cluster: status up Jul 15 10:04:32 sec.master heartbeat: [6281]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Jul 15 10:04:32 sec.master heartbeat: [4559]: debug: StartNextRemoteRscReq(): child count 1 Jul 15 10:04:32 sec.master heartbeat: [4559]: debug: get_delnodelist: delnodelist= logd is not running2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status harc[6281]: 2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status Jul 15 10:04:32 sec.master heartbeat: [4559]: info: Status update for node test.cluster: status active Jul 15 10:04:32 sec.master heartbeat: [4559]: debug: StartNextRemoteRscReq(): child count 1 Jul 15 10:04:32 sec.master heartbeat: [6301]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL logd is not running2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status harc[6301]: 2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status Jul 15 10:04:32 sec.master heartbeat: [6321]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL logd is not running2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status harc[6321]: 2010/07/15_10:04:32 info: Running /etc/ha.d/rc.d/status status Jul 15 10:04:33 sec.master heartbeat: [4559]: info: remote resource transition completed. Jul 15 10:04:33 sec.master heartbeat: [4559]: info: sec.master wants to go standby [foreign] Jul 15 10:04:34 sec.master heartbeat: [4559]: info: standby: test.cluster can take our foreign resources Jul 15 10:04:34 sec.master heartbeat: [6341]: info: give up foreign HA resources (standby). logd is not running2010/07/15_10:04:34 info: Releasing resource group: test.cluster IPaddr::192.168.0.50 ResourceManager[6356]: 2010/07/15_10:04:34 info: Releasing resource group: test.cluster IPaddr::192.168.0.50 logd is not running2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.50 stop ResourceManager[6356]: 2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.50 stop In IP Stop SIOCDELRT: No such process logd is not running2010/07/15_10:04:34 INFO: ifconfig eth0:0 down IPaddr[6414]: 2010/07/15_10:04:34 INFO: ifconfig eth0:0 down logd is not running2010/07/15_10:04:34 INFO: Success IPaddr[6393]: 2010/07/15_10:04:34 INFO: Success INFO: Success logd is not running2010/07/15_10:04:34 info: Releasing resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3 ResourceManager[6445]: 2010/07/15_10:04:34 info: Releasing resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3 logd is not running2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /replicated ext3 stop ResourceManager[6445]: 2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /replicated ext3 stop logd is not running2010/07/15_10:04:34 INFO: Running stop for /dev/drbd1 on /replicated Filesystem[6502]: 2010/07/15_10:04:34 INFO: Running stop for /dev/drbd1 on /replicated logd is not running2010/07/15_10:04:34 INFO: Trying to unmount /replicated Filesystem[6502]: 2010/07/15_10:04:34 INFO: Trying to unmount /replicated logd is not running2010/07/15_10:04:34 INFO: unmounted /replicated successfully Filesystem[6502]: 2010/07/15_10:04:34 INFO: unmounted /replicated successfully logd is not running2010/07/15_10:04:34 INFO: Success Filesystem[6484]: 2010/07/15_10:04:34 INFO: Success INFO: Success logd is not running2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/drbddisk drbd1 stop ResourceManager[6445]: 2010/07/15_10:04:34 info: Running /etc/ha.d/resource.d/drbddisk drbd1 stop Jul 15 10:04:34 sec.master heartbeat: [6341]: info: foreign HA resource release completed (standby). Jul 15 10:04:34 sec.master heartbeat: [4559]: info: Local standby process completed [foreign]. Jul 15 10:04:35 sec.master heartbeat: [4559]: WARN: 1 lost packet(s) for [test.cluster] [13:15] Jul 15 10:04:35 sec.master heartbeat: [4559]: info: remote resource transition completed. Jul 15 10:04:35 sec.master heartbeat: [4559]: info: No pkts missing from test.cluster! Jul 15 10:04:35 sec.master heartbeat: [4559]: info: Other node completed standby takeover of foreign resources. Jul 15 10:55:08 sec.master heartbeat: [4559]: WARN: node test.cluster: is dead Jul 15 10:55:08 sec.master heartbeat: [4559]: WARN: No STONITH device configured. Jul 15 10:55:08 sec.master heartbeat: [4559]: WARN: Shared disks are not protected. Jul 15 10:55:08 sec.master heartbeat: [4559]: info: Resources being acquired from test.cluster. Jul 15 10:55:08 sec.master heartbeat: [7293]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Jul 15 10:55:08 sec.master heartbeat: [4559]: info: Link test.cluster:eth0 dead. Jul 15 10:55:08 sec.master heartbeat: [7294]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys sec.master] to acquire. Jul 15 10:55:08 sec.master heartbeat: [4559]: debug: StartNextRemoteRscReq(): child count 1 *logd is not running2010/07/15_10:55:08 info: Running /etc/ha.d/rc.d/status status harc[7293]: 2010/07/15_10:55:08 info: Running /etc/ha.d/rc.d/status status logd is not running2010/07/15_10:55:08 info: Taking over resource group IPaddr::192.168.0.50 mach_down[7328]: 2010/07/15_10:55:08 info: Taking over resource group IPaddr::192.168.0.50 logd is not running2010/07/15_10:55:08 info: Acquiring resource group: test.cluster IPaddr::192.168.0.50 ResourceManager[7358]: 2010/07/15_10:55:08 info: Acquiring resource group: test.cluster IPaddr::192.168.0.50 logd is not running2010/07/15_10:55:08 INFO: Resource is stopped IPaddr[7387]: 2010/07/15_10:55:08 INFO: Resource is stopped logd is not running2010/07/15_10:55:08 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.50 start* ResourceManager[7358]: 2010/07/15_10:55:08 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.50 start logd is not running2010/07/15_10:55:09 INFO: Using calculated nic for 192.168.0.50: eth0 IPaddr[7470]: 2010/07/15_10:55:09 INFO: Using calculated nic for 192.168.0.50: eth0 logd is not running2010/07/15_10:55:09 INFO: Using calculated netmask for 192.168.0.50: 255.255.255.0 IPaddr[7470]: 2010/07/15_10:55:09 INFO: Using calculated netmask for 192.168.0.50: 255.255.255.0 logd is not running2010/07/15_10:55:09 INFO: eval ifconfig eth0:0 192.168.0.50 netmask 255.255.255.0 broadcast 192.168.0.255 IPaddr[7470]: 2010/07/15_10:55:09 INFO: eval ifconfig eth0:0 192.168.0.50 netmask 255.255.255.0 broadcast 192.168.0.255 logd is not running2010/07/15_10:55:09 INFO: Success IPaddr[7449]: 2010/07/15_10:55:09 INFO: Success INFO: Success *logd is not running2010/07/15_10:55:09 info: Taking over resource group drbddisk::drbd1 mach_down[7328]: 2010/07/15_10:55:09 info: Taking over resource group drbddisk::drbd1 logd is not running2010/07/15_10:55:09 info: Acquiring resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3 ResourceManager[7570]: 2010/07/15_10:55:09 info: Acquiring resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3 logd is not running2010/07/15_10:55:09 info: Running /etc/ha.d/resource.d/drbddisk drbd1 start ResourceManager[7570]: 2010/07/15_10:55:09 info: Running /etc/ha.d/resource.d/drbddisk drbd1 start 1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command '/sbin/drbdsetup 1 primary' terminated with exit code 17* 1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command '/sbin/drbdsetup 1 primary' terminated with exit code 17 1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command '/sbin/drbdsetup 1 primary' terminated with exit code 17 1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command '/sbin/drbdsetup 1 primary' terminated with exit code 17 1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command '/sbin/drbdsetup 1 primary' terminated with exit code 17 1: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command '/sbin/drbdsetup 1 primary' terminated with exit code 17 logd is not running2010/07/15_10:55:14 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk ResourceManager[7570]: 2010/07/15_10:55:14 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk logd is not running2010/07/15_10:55:14 CRIT: Giving up resources due to failure of drbddisk::drbd1 ResourceManager[7570]: 2010/07/15_10:55:14 CRIT: Giving up resources due to failure of drbddisk::drbd1 logd is not running2010/07/15_10:55:14 info: Releasing resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3 ResourceManager[7570]: 2010/07/15_10:55:14 info: Releasing resource group: test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3 logd is not running2010/07/15_10:55:14 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /replicated ext3 stop ResourceManager[7570]: 2010/07/15_10:55:14 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /replicated ext3 stop logd is not running2010/07/15_10:55:14 INFO: Running stop for /dev/drbd1 on /replicated Filesystem[7694]: 2010/07/15_10:55:14 INFO: Running stop for /dev/drbd1 on /replicated /dev/drbd1: Wrong medium type logd is not running2010/07/15_10:55:14 INFO: Success Filesystem[7679]: 2010/07/15_10:55:14 INFO: Success INFO: Success logd is not running2010/07/15_10:55:14 info: Running /etc/ha.d/resource.d/drbddisk drbd1 stop ResourceManager[7570]: 2010/07/15_10:55:14 info: Running /etc/ha.d/resource.d/drbddisk drbd1 stop logd is not running2010/07/15_10:55:14 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[7328]: 2010/07/15_10:55:14 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired logd is not running2010/07/15_10:55:14 info: mach_down takeover complete for node test.cluster. mach_down[7328]: 2010/07/15_10:55:14 info: mach_down takeover complete for node test.cluster. Jul 15 10:55:14 sec.master heartbeat: [4559]: info: mach_down takeover complete. ARPING 192.168.0.50 from 192.168.0.50 eth0 Sent 10 probes (10 broadcast(s)) Received 0 response(s) logd is not running2010/07/15_10:55:19 ERROR: Could not send gratuitous arps. rc=1 IPaddr[7470]: 2010/07/15_10:55:19 ERROR: Could not send gratuitous arps. rc=1 logd is not running2010/07/15_10:55:44 Going standby [foreign]. hb_standby[7820]: 2010/07/15_10:55:44 Going standby [foreign]. Jul 15 10:55:44 sec.master heartbeat: [4559]: info: sec.master wants to go standby [foreign] Jul 15 10:55:55 sec.master heartbeat: [4559]: WARN: No reply to standby request. Standby request cancelled. *my /etc/ha.d/haresource file is:* test.cluster IPaddr::192.168.0.50 test.cluster drbddisk::drbd1 Filesystem::/dev/drbd1::/replicated::ext3 *where test.cluster is my primary node and 192.168.0.50 is virtual IP on which heartbeat is running *my /etc/drbd.conf file is:* global { usage-count yes; } common { protocol C; } resource drbd1 { on test.cluster { device /dev/drbd1; disk /dev/sda5; address 192.168.0.148:7789; meta-disk internal; } on sec.master { device /dev/drbd1; disk /dev/sda5; address 192.168.0.190:7789; meta-disk internal; } } I just want to run my drbd with heartbeat and replicate the /dev/drbd1 and run some service on it immediately in the case of fail-over. So Please help me as soon as possible. Thanks Deepak -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100715/81e2654c/attachment.htm>