Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all,
i have a very strange problem with my linux-cluster. I have two dell servers
with a shared drbd-partition for the data. The connection state on both
systems seems to be fine, if you can trust /proc/drbd:
default:~ # cat /proc/drbd
version: 0.7.14 (api:77/proto:74)
SVN Revision: 1990 build by root at girgendwas, 2006-05-18 03:03:57
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:928491536 nr:148 dw:907947060 dr:35596009 al:42515 bm:2803 lo:0 pe:0
ua:0 ap:0
default:~ #
and on the other side:
backup:~ # cat /proc/drbd
version: 0.7.14 (api:77/proto:74)
SVN Revision: 1990 build by root at girgendwas, 2006-05-18 03:03:57
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:148 nr:928498956 dw:939440840 dr:80299 al:29 bm:7584 lo:0 pe:0 ua:0
ap:0
backup:~ #
When I force the system to change roles and the backup-system is mounting
the datadisk-partition, at first everythings seems to be okay, no error
messages.
But if I take a look at the files on that partition, there are very strange
effects. Filenames aren't correct and I get I/O-error if I try to access
directories
backup:~ # ls /datadisk/
.
..
a2chive
i.coming
quarantine
spama3sassin
Can you see the dot in "incoming"? I get errors while trying to access that
directory:
backup:~ # l /datadisk/
/bin/ls: /datadisk/MailScanner/i.coming: Input/output error
/bin/ls: /datadisk/MailScanner/quarantine: Input/output error total 16
drwxr-xr-x 6 root root 4096 Dec 6 2005 ./
drwxr-xr-x 22 root root 4096 Sep 25 18:49 ../
?rwxrwx--- 2 postfix www 4096 Jan 27 2006 a2chive
drwxr-xr-x 2 postfix postfix 4096 Oct 24 2005 spama3sassin/
backup:~ #
Has anybody seen such an effect before? I have several other clusters with
the same hardware and this never happened before.
With kind regards,
Volker Dose