Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Jul 12, 2010 at 05:12:24PM +0900, Junko IKEDA wrote: > Hi, > > I could got RA log. > It said; > > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: > (prmDrPostgreSQLDB:monitor:stderr) unlink: Read-only file system > > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) open(/var/lock/drbd-147-0): Read-only file system Oh. I see. EROFS. We should catch that one, probably, and irgnore it at least for monitor (role, status, dstate, etc.) and "down" related commands (secondary, disconnect, detach), or handle it more gracefully in some yet to be specified way. > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) Command ' > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) /sbin/drbdsetup > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) 0 > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) role > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) ' terminated with exit code 20 > lrmd[10303]: 2010/07/12_11:57:48 info: RA output: (prmDrPostgreSQLDB:monitor:stderr) drbdadm role r0: exited with code 20 > > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) unlink: Read-only file system > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) open(/var/lock/drbd-147-0): Read-only file system > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) Command ' > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) /sbin/drbdsetup > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) 0 > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) secondary > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) ' terminated with exit code 20 > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) drbdadm secondary r0: exited with code 20 > lrmd[10303]: 2010/07/12_11:57:55 info: RA output: (prmDrPostgreSQLDB:stop:stderr) /sbin/drbdadm secondary r0: exit code 20, mapping to 0 > > I hope that this output is what you want to see. Yes, thanks. FYI, the "ocf" drbd agent does loop on monitor, and should return a generic error in that case already. Maybe we should add such a loop to drbddisk as well. Or somehow set it up as a wrapper around the ocf agent (though that may not be easily possible). Yes, your patch is ok. Still I'm not taking it as such, but probably make drbdsetup more robust in face of file system problems on /var/lock/, and add a monitoring loop to drbddisk instead. Thanks, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed