[DRBD-user] small patch for drbddisk

Junko IKEDA tsukishima.ha at gmail.com
Mon Jul 12 10:12:24 CEST 2010


Hi,

I could got RA log.
It said;

lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr) unlink: Read-only file system

lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr) open(/var/lock/drbd-147-0):
Read-only file system

lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr) Command '
lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr) /sbin/drbdsetup
lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr)
lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr) 0
lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr)
lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr) role
lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr) ' terminated with exit code 20

lrmd[10303]: 2010/07/12_11:57:48 info: RA output:
(prmDrPostgreSQLDB:monitor:stderr) drbdadm role r0: exited with code
20


lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) unlink: Read-only file system

lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) open(/var/lock/drbd-147-0): Read-only
file system

lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) Command '
lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) /sbin/drbdsetup
lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr)
lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) 0
lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr)
lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) secondary
lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) ' terminated with exit code 20

lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) drbdadm secondary r0: exited with code
20

lrmd[10303]: 2010/07/12_11:57:55 info: RA output:
(prmDrPostgreSQLDB:stop:stderr) /sbin/drbdadm secondary r0: exit code
20, mapping to 0


I hope that this output is what you want to see.

Thanks,
Junko

2010/7/9 Lars Ellenberg <lars.ellenberg at linbit.com>:
> On Fri, Jul 09, 2010 at 07:03:32PM +0900, Junko IKEDA wrote:
>> Hi,
>>
>> > Can you please provide kernel and resource agent (heartbeat) logs
>> > for such an incident.
>>
>> # uname -a
>> Linux dl380g5d 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> # cat /etc/redhat-release
>> Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>>
>> There is no RA log on ACT node after this failure because its disks
>> has already removed.
>> SBY node's log says like this;
>>
>> info: rsc:prmDrPostgreSQLDB: start
>> info: RA output: (prmDrPostgreSQLDB:start:stderr) 0: State change
>> failed: (-1) Multiple primaries not allowed by config
>>
>> info: RA output: (prmDrPostgreSQLDB:start:stderr) Command '
>> info: RA output: (prmDrPostgreSQLDB:start:stderr) /sbin/drbdsetup
>> info: RA output: (prmDrPostgreSQLDB:start:stderr)
>> info: RA output: (prmDrPostgreSQLDB:start:stderr) 0
>> info: RA output: (prmDrPostgreSQLDB:start:stderr)
>> info: RA output: (prmDrPostgreSQLDB:start:stderr) primary
>> info: RA output: (prmDrPostgreSQLDB:start:stderr) ' terminated with exit code 11
>
> I don't doubt the followon problems.
>
> I'd like to see the _original_ error,
> where the drbdsetup secondary fails,
> so I can better see if your fix is the "right" fix.
>
>> I will try to change the destination of log and get the perfect RA log
>> on ACT node later.
>
> Great.
> Thanks,
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
pgsql[11138][11191]: 2010/07/12_11:57:00 INFO: PostgreSQL start command sent.
pgsql[11138][11198]: 2010/07/12_11:57:01 ERROR: PostgreSQL template1 isn't running
pgsql[11138][11207]: 2010/07/12_11:57:02 DEBUG: PostgreSQL still hasn't started yet. Waiting...
pgsql[11138][11217]: 2010/07/12_11:57:03 DEBUG: PostgreSQL still hasn't started yet. Waiting...
crmd[10306]: 2010/07/12_11:57:03 info: process_lrm_event: LRM operation prmApPostgreSQLDB_start_0 (call=14, rc=0) complete 
crmd[10306]: 2010/07/12_11:57:04 info: do_lrm_rsc_op: Performing op=prmApPostgreSQLDB_monitor_10000 key=22:4:0:380b305a-7313-4cf2-b63b-52b9476388c2)
crmd[10306]: 2010/07/12_11:57:04 info: process_lrm_event: LRM operation prmApPostgreSQLDB_monitor_10000 (call=15, rc=0) complete 
crmd[10306]: 2010/07/12_11:57:05 info: process_lrm_event: LRM operation prmStonith:1_start_0 (call=8, rc=0) complete 
lrmd[10303]: 2010/07/12_11:57:05 debug: stonithRA plugin: provider attribute is not needed and will be ignored.
crmd[10306]: 2010/07/12_11:57:06 info: do_lrm_rsc_op: Performing op=prmStonith:1_monitor_3600000 key=30:4:0:380b305a-7313-4cf2-b63b-52b9476388c2)
crmd[10306]: 2010/07/12_11:57:13 info: process_lrm_event: LRM operation prmStonith:1_monitor_3600000 (call=16, rc=0) complete 
lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) unlink: Read-only file system

lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) open(/var/lock/drbd-147-0): Read-only file system

lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) Command '
lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) /sbin/drbdsetup
lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr)  
lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) 0
lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr)  
lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) role
lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) ' terminated with exit code 20

lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) drbdadm role r0: exited with code 20

crmd[10306]: 2010/07/12_11:57:48 info: process_lrm_event: LRM operation prmDrPostgreSQLDB_monitor_10000 (call=9, rc=7) complete 
crmd[10306]: 2010/07/12_11:57:49 info: do_lrm_rsc_op: Performing op=prmApPostgreSQLDB_stop_0 key=21:5:0:380b305a-7313-4cf2-b63b-52b9476388c2)
lrmd[10303]: 2010/07/12_11:57:49 info: rsc:prmApPostgreSQLDB: stop
crmd[10306]: 2010/07/12_11:57:49 info: process_lrm_event: LRM operation prmApPostgreSQLDB_monitor_10000 (call=15, rc=-2) Cancelled 
pgsql[12554][12615]: 2010/07/12_11:57:50 INFO: PostgreSQL is down
pgsql[12554][12677]: 2010/07/12_11:57:51 DEBUG: PostgreSQL still hasn't stopped yet. Waiting...
crmd[10306]: 2010/07/12_11:57:51 info: process_lrm_event: LRM operation prmApPostgreSQLDB_stop_0 (call=17, rc=0) complete 
crmd[10306]: 2010/07/12_11:57:52 info: do_lrm_rsc_op: Performing op=prmIpPostgreSQLDB_stop_0 key=18:5:0:380b305a-7313-4cf2-b63b-52b9476388c2)
lrmd[10303]: 2010/07/12_11:57:52 info: rsc:prmIpPostgreSQLDB: stop
crmd[10306]: 2010/07/12_11:57:52 info: process_lrm_event: LRM operation prmIpPostgreSQLDB_monitor_10000 (call=13, rc=-2) Cancelled 
lrmd[10303]: 2010/07/12_11:57:52 info: RA output: (prmIpPostgreSQLDB:stop:stdout) In IP Stop

lrmd[10303]: 2010/07/12_11:57:52 info: RA output: (prmIpPostgreSQLDB:stop:stderr) SIOCDELRT: No such process

IPaddr[12679][12694]: 2010/07/12_11:57:52 INFO: ifconfig bond0:0 down
lrmd[10303]: 2010/07/12_11:57:52 info: RA output: (prmIpPostgreSQLDB:stop:stderr) rm: cannot remove `/var/run/heartbeat/rsctmp/IPaddr/bond0:0': Read-only file system

crmd[10306]: 2010/07/12_11:57:52 info: process_lrm_event: LRM operation prmIpPostgreSQLDB_stop_0 (call=18, rc=0) complete 
crmd[10306]: 2010/07/12_11:57:53 info: do_lrm_rsc_op: Performing op=prmFsPostgreSQLDB_stop_0 key=15:5:0:380b305a-7313-4cf2-b63b-52b9476388c2)
lrmd[10303]: 2010/07/12_11:57:53 info: rsc:prmFsPostgreSQLDB: stop
crmd[10306]: 2010/07/12_11:57:53 info: process_lrm_event: LRM operation prmFsPostgreSQLDB_monitor_10000 (call=11, rc=-2) Cancelled 
Filesystem[12713][12743]: 2010/07/12_11:57:53 INFO: Running stop for /dev/drbd0 on /dbfp
Filesystem[12713][12753]: 2010/07/12_11:57:53 INFO: Trying to unmount /dbfp
Filesystem[12713][12756]: 2010/07/12_11:57:54 INFO: unmounted /dbfp successfully
crmd[10306]: 2010/07/12_11:57:54 info: process_lrm_event: LRM operation prmFsPostgreSQLDB_stop_0 (call=19, rc=0) complete 
crmd[10306]: 2010/07/12_11:57:55 info: do_lrm_rsc_op: Performing op=prmDrPostgreSQLDB_stop_0 key=3:5:0:380b305a-7313-4cf2-b63b-52b9476388c2)
lrmd[10303]: 2010/07/12_11:57:55 info: rsc:prmDrPostgreSQLDB: stop
crmd[10306]: 2010/07/12_11:57:55 info: process_lrm_event: LRM operation prmDrPostgreSQLDB_monitor_10000 (call=9, rc=-2) Cancelled 
lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) unlink: Read-only file system

lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) open(/var/lock/drbd-147-0): Read-only file system

lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) Command '
lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) /sbin/drbdsetup
lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr)  
lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) 0
lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr)  
lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) secondary
lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) ' terminated with exit code 20

lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) drbdadm secondary r0: exited with code 20

lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) /sbin/drbdadm secondary r0: exit code 20, mapping to 0

crmd[10306]: 2010/07/12_11:57:55 info: process_lrm_event: LRM operation prmDrPostgreSQLDB_stop_0 (call=20, rc=0) complete 
diskd[10310]: 2010/07/12_11:57:56 ERROR: diskcheck: Could not read from device /dev/cciss/c0d0
diskd[10310]: 2010/07/12_11:58:01 ERROR: diskcheck: Could not read from device /dev/cciss/c0d0
diskd[10310]: 2010/07/12_11:58:01 WARN: diskcheck: Error(s) occurred in diskcheck function.
diskd[10310]: 2010/07/12_11:58:01 WARN: check_old_status: disk status is changed, new_status = ERROR
attrd[10305]: 2010/07/12_11:58:01 info: attrd_trigger_update: Sending flush op to all hosts for: diskcheck_os
attrd[10305]: 2010/07/12_11:58:01 info: attrd_perform_update: Sent update 21: diskcheck_os=ERROR
diskd[10310]: 2010/07/12_11:58:06 ERROR: diskcheck: Could not read from device /dev/cciss/c0d0
diskd[10310]: 2010/07/12_11:58:11 ERROR: diskcheck: Could not read from device /dev/cciss/c0d0
diskd[10310]: 2010/07/12_11:58:11 WARN: diskcheck: Error(s) occurred in diskcheck function.


More information about the drbd-user mailing list