Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I could got RA log. It said; lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) unlink: Read-only file system lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) open(/var/lock/drbd-147-0): Read-only file system lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) Command ' lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) /sbin/drbdsetup lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) 0 lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) role lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) ' terminated with exit code 20 lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) drbdadm role r0: exited with code 20 lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) unlink: Read-only file system lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) open(/var/lock/drbd-147-0): Read-only file system lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) Command ' lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) /sbin/drbdsetup lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) 0 lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) secondary lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) ' terminated with exit code 20 lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) drbdadm secondary r0: exited with code 20 lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) /sbin/drbdadm secondary r0: exit code 20, mapping to 0 I hope that this output is what you want to see. Thanks, Junko 2010/7/9 Lars Ellenberg <lars.ellenberg at linbit.com>: > On Fri, Jul 09, 2010 at 07:03:32PM +0900, Junko IKEDA wrote: >> Hi, >> >> > Can you please provide kernel and resource agent (heartbeat) logs >> > for such an incident. >> >> # uname -a >> Linux dl380g5d 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 >> x86_64 x86_64 x86_64 GNU/Linux >> >> # cat /etc/redhat-release >> Red Hat Enterprise Linux Server release 5.5 (Tikanga) >> >> There is no RA log on ACT node after this failure because its disks >> has already removed. >> SBY node's log says like this; >> >> info: rsc:prmDrPostgreSQLDB: start >> info: RA output: (prmDrPostgreSQLDB:start:stderr) 0: State change >> failed: (-1) Multiple primaries not allowed by config >> >> info: RA output: (prmDrPostgreSQLDB:start:stderr) Command ' >> info: RA output: (prmDrPostgreSQLDB:start:stderr) /sbin/drbdsetup >> info: RA output: (prmDrPostgreSQLDB:start:stderr) >> info: RA output: (prmDrPostgreSQLDB:start:stderr) 0 >> info: RA output: (prmDrPostgreSQLDB:start:stderr) >> info: RA output: (prmDrPostgreSQLDB:start:stderr) primary >> info: RA output: (prmDrPostgreSQLDB:start:stderr) ' terminated with exit code 11 > > I don't doubt the followon problems. > > I'd like to see the _original_ error, > where the drbdsetup secondary fails, > so I can better see if your fix is the "right" fix. > >> I will try to change the destination of log and get the perfect RA log >> on ACT node later. > > Great. > Thanks, > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > __ > please don't Cc me, but send to list -- I'm subscribed > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -------------- next part -------------- pgsql[11138][11191]: 2010/07/12_11:57:00 INFO: PostgreSQL start command sent. pgsql[11138][11198]: 2010/07/12_11:57:01 ERROR: PostgreSQL template1 isn't running pgsql[11138][11207]: 2010/07/12_11:57:02 DEBUG: PostgreSQL still hasn't started yet. Waiting... pgsql[11138][11217]: 2010/07/12_11:57:03 DEBUG: PostgreSQL still hasn't started yet. Waiting... crmd[10306]: 2010/07/12_11:57:03 info: process_lrm_event: LRM operation prmApPostgreSQLDB_start_0 (call=14, rc=0) complete crmd[10306]: 2010/07/12_11:57:04 info: do_lrm_rsc_op: Performing op=prmApPostgreSQLDB_monitor_10000 key=22:4:0:380b305a-7313-4cf2-b63b-52b9476388c2) crmd[10306]: 2010/07/12_11:57:04 info: process_lrm_event: LRM operation prmApPostgreSQLDB_monitor_10000 (call=15, rc=0) complete crmd[10306]: 2010/07/12_11:57:05 info: process_lrm_event: LRM operation prmStonith:1_start_0 (call=8, rc=0) complete lrmd[10303]: 2010/07/12_11:57:05 debug: stonithRA plugin: provider attribute is not needed and will be ignored. crmd[10306]: 2010/07/12_11:57:06 info: do_lrm_rsc_op: Performing op=prmStonith:1_monitor_3600000 key=30:4:0:380b305a-7313-4cf2-b63b-52b9476388c2) crmd[10306]: 2010/07/12_11:57:13 info: process_lrm_event: LRM operation prmStonith:1_monitor_3600000 (call=16, rc=0) complete lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) unlink: Read-only file system lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) open(/var/lock/drbd-147-0): Read-only file system lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) Command ' lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) /sbin/drbdsetup lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) 0 lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) role lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) ' terminated with exit code 20 lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) drbdadm role r0: exited with code 20 crmd[10306]: 2010/07/12_11:57:48 info: process_lrm_event: LRM operation prmDrPostgreSQLDB_monitor_10000 (call=9, rc=7) complete crmd[10306]: 2010/07/12_11:57:49 info: do_lrm_rsc_op: Performing op=prmApPostgreSQLDB_stop_0 key=21:5:0:380b305a-7313-4cf2-b63b-52b9476388c2) lrmd[10303]: 2010/07/12_11:57:49 info: rsc:prmApPostgreSQLDB: stop crmd[10306]: 2010/07/12_11:57:49 info: process_lrm_event: LRM operation prmApPostgreSQLDB_monitor_10000 (call=15, rc=-2) Cancelled pgsql[12554][12615]: 2010/07/12_11:57:50 INFO: PostgreSQL is down pgsql[12554][12677]: 2010/07/12_11:57:51 DEBUG: PostgreSQL still hasn't stopped yet. Waiting... crmd[10306]: 2010/07/12_11:57:51 info: process_lrm_event: LRM operation prmApPostgreSQLDB_stop_0 (call=17, rc=0) complete crmd[10306]: 2010/07/12_11:57:52 info: do_lrm_rsc_op: Performing op=prmIpPostgreSQLDB_stop_0 key=18:5:0:380b305a-7313-4cf2-b63b-52b9476388c2) lrmd[10303]: 2010/07/12_11:57:52 info: rsc:prmIpPostgreSQLDB: stop crmd[10306]: 2010/07/12_11:57:52 info: process_lrm_event: LRM operation prmIpPostgreSQLDB_monitor_10000 (call=13, rc=-2) Cancelled lrmd[10303]: 2010/07/12_11:57:52 info: RA output: (prmIpPostgreSQLDB:stop:stdout) In IP Stop lrmd[10303]: 2010/07/12_11:57:52 info: RA output: (prmIpPostgreSQLDB:stop:stderr) SIOCDELRT: No such process IPaddr[12679][12694]: 2010/07/12_11:57:52 INFO: ifconfig bond0:0 down lrmd[10303]: 2010/07/12_11:57:52 info: RA output: (prmIpPostgreSQLDB:stop:stderr) rm: cannot remove `/var/run/heartbeat/rsctmp/IPaddr/bond0:0': Read-only file system crmd[10306]: 2010/07/12_11:57:52 info: process_lrm_event: LRM operation prmIpPostgreSQLDB_stop_0 (call=18, rc=0) complete crmd[10306]: 2010/07/12_11:57:53 info: do_lrm_rsc_op: Performing op=prmFsPostgreSQLDB_stop_0 key=15:5:0:380b305a-7313-4cf2-b63b-52b9476388c2) lrmd[10303]: 2010/07/12_11:57:53 info: rsc:prmFsPostgreSQLDB: stop crmd[10306]: 2010/07/12_11:57:53 info: process_lrm_event: LRM operation prmFsPostgreSQLDB_monitor_10000 (call=11, rc=-2) Cancelled Filesystem[12713][12743]: 2010/07/12_11:57:53 INFO: Running stop for /dev/drbd0 on /dbfp Filesystem[12713][12753]: 2010/07/12_11:57:53 INFO: Trying to unmount /dbfp Filesystem[12713][12756]: 2010/07/12_11:57:54 INFO: unmounted /dbfp successfully crmd[10306]: 2010/07/12_11:57:54 info: process_lrm_event: LRM operation prmFsPostgreSQLDB_stop_0 (call=19, rc=0) complete crmd[10306]: 2010/07/12_11:57:55 info: do_lrm_rsc_op: Performing op=prmDrPostgreSQLDB_stop_0 key=3:5:0:380b305a-7313-4cf2-b63b-52b9476388c2) lrmd[10303]: 2010/07/12_11:57:55 info: rsc:prmDrPostgreSQLDB: stop crmd[10306]: 2010/07/12_11:57:55 info: process_lrm_event: LRM operation prmDrPostgreSQLDB_monitor_10000 (call=9, rc=-2) Cancelled lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) unlink: Read-only file system lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) open(/var/lock/drbd-147-0): Read-only file system lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) Command ' lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) /sbin/drbdsetup lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) 0 lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) secondary lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) ' terminated with exit code 20 lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) drbdadm secondary r0: exited with code 20 lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) /sbin/drbdadm secondary r0: exit code 20, mapping to 0 crmd[10306]: 2010/07/12_11:57:55 info: process_lrm_event: LRM operation prmDrPostgreSQLDB_stop_0 (call=20, rc=0) complete diskd[10310]: 2010/07/12_11:57:56 ERROR: diskcheck: Could not read from device /dev/cciss/c0d0 diskd[10310]: 2010/07/12_11:58:01 ERROR: diskcheck: Could not read from device /dev/cciss/c0d0 diskd[10310]: 2010/07/12_11:58:01 WARN: diskcheck: Error(s) occurred in diskcheck function. diskd[10310]: 2010/07/12_11:58:01 WARN: check_old_status: disk status is changed, new_status = ERROR attrd[10305]: 2010/07/12_11:58:01 info: attrd_trigger_update: Sending flush op to all hosts for: diskcheck_os attrd[10305]: 2010/07/12_11:58:01 info: attrd_perform_update: Sent update 21: diskcheck_os=ERROR diskd[10310]: 2010/07/12_11:58:06 ERROR: diskcheck: Could not read from device /dev/cciss/c0d0 diskd[10310]: 2010/07/12_11:58:11 ERROR: diskcheck: Could not read from device /dev/cciss/c0d0 diskd[10310]: 2010/07/12_11:58:11 WARN: diskcheck: Error(s) occurred in diskcheck function.