Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
At random times (under high load possibly, although i can't say for
certain) the drbd partition falls into read-only mode. The software
that accesses the data on the partition starts reporting errors like:
May 1 21:24:47 [kernel] Aborting journal on device drbd0.
May 1 21:24:47 [lmtpunix] IOERROR: appending index records for
user.joe: Input/output error
May 1 21:24:47 [kernel] EXT3-fs error (device drbd0) in
add_dirent_to_buf: Journal has aborted
May 2 01:24:47 [postfix/postdrop] warning: mail_queue_enter: create
file maildrop/225423.27996: Read-only file system
May 1 21:24:47 [kernel] EXT3-fs error (device drbd0) in
start_transaction: Journal has aborted
- Last output repeated 4 times -
May 1 21:24:47 [lmtpunix] IOERROR: creating quota file /export/cyrus/
imap/quota/j/user.joe.NEW: Read-only file system
May 1 21:24:47 [kernel] EXT3-fs error (device drbd0) in
start_transaction: Journal has aborted
May 1 21:24:47 [lmtpunix] DBERROR: error storing user.joe: cyrusdb
error
May 1 21:24:47 [lmtpunix] LOSTQUOTA: unable to record use of 3029
bytes in quota file user.joe
May 1 21:24:47 [lmtpunix] IOERROR: error unlinking file /export/
cyrus/spool/imap/stage./27887-1114997080-0: Read-only file system
May 1 21:24:47 [kernel] EXT3-fs error (device drbd0) in
start_transaction: Journal has aborted
May 1 21:24:47 [postfix/local] fatal: update queue file active/
0/0CED617BC073: Read-only file system
May 1 21:24:47 [kernel] EXT3-fs error (device drbd0) in
start_transaction: Journal has aborted
The only solution is to reboot the system and let the other cluster
twin take over.
Does anyone have any idea what going on? This is causing enormous
file corruption issues for us.
Thanks,
Lee