Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Thanks a lot for this quick and long answer !
Lars Ellenberg wrote :
> On Mon, Jun 23, 2008 at 04:32:25PM +0200, Eric Marin wrote:
>> Hello and sorry about the length of this report,
>
> I have a few comments, below.
>
(...)
> this does not really help "after the fact".
> it would be of interesst to see wether this is in file "payload data"
> area, or in file system "meta data" area (allocation bitmaps and such).
>
> I don't know from the top of my head how to find out for ext3,
> but I remember it was not too difficult.
I'm sorry, I'm not sure I understand...
> there are a few basic ways how data can end up being different on the
> two replicas. let us leave out additional modes that could be seen when
> hard node crashes/power cycles/etc are involved, but focus only on what
> could happen during "normal" operation.
> - hardware problems, like broken RAM
> - a bug in drbd
> - a bug in other drivers
> - someone fiddling around with direct access or otherwise bypassing
> drbd
>
A consistency check on the RAID array didn't detect any error, so I
suppose they are clean. Right now , I'm running MemTest86+.
Maybe the crossover ethernet cable is simply bad (!), but I don't think
this would explain the crashes.
I did this once and got identical checksums :
ldap-a:~# dd if=/dev/urandom of=/tmp/foobar bs=1M count=256
ldap-a:~# md5sum /tmp/foobar
ldap-b:~# netcat -l -p 4711 | md5sum
ldap-a:~# netcat -q0 192.168.0.2 4711 < /tmp/foobar
> maybe first enable the "integrity checking",
> see if that detects something,
> then disable the checksum offloading,
> see if it still occurs,
> if not leave offloading disabled,
> but disable again the "integrity checking" for performance reasons.
OK, I'm going to try that next.
>> ldap-a has crashed twice in three weeks,
>
> what exactly does "crashed" mean?
> just "unresponsive"? panic? oops? BUG()?
> any logs? anything from the console?
The screen is completely black and the keyboard unresponsive. Even
Alt+Print+B doesn't reboot the server, I have to press the power button
for a few seconds. I can't check right now, but the first time it
crashed, I couldn't find anything in the logs, except for out-of-sync
warnings and a corrupted FS on a partition mounted by DRBD.
Only ldap-a has crashed for now, but drbdadm verify all was always
executed on ldap-a.
> oh, and in that case (crashed a few times),
> we need to consider more failure modes:
> is there any volatile disk cache involved?
> is no-disk-flushes set?
> is no-md-flushes set?
Do you mean volatile disk cache on the RAID card ?
I enabled no-disk-flushes and no-md-flushes so as not to pollute logs
with "local disk flush failed", I was getting lots of that. I think this
is safe with a hardware RAID card with battery-backed cache (PERC 6i),
isn't it (I hope so !) ?
The write policy is : write back.
Contents of /etc/drbd.conf :
--------8<----------------------------------------------------------------------
global {
usage-count no;
}
common {
handlers {
outdate-peer "echo 'Cable croise debranche entre ldap-a et ldap-b ?' |
/usr/bin/Mail -s 'OUTDATE-PEER SUR LDAP/CAS !' xxxx at utc.fr &
/usr/lib/heartbeat/drbd-peer-outdater";
split-brain "echo 'DRBD a detecte une situation de split-brain.
Intervention manuelle necessaire sur ldap-a et ldap-b !' | /usr/bin/Mail
-s 'SPLIT-BRAIN SUR LDAP/CAS !' xxxx at utc.fr";
out-of-sync "echo 'DRBD est desynchronise. Intervention manuelle
necessaire sur ldap-a et ldap-b !' | /usr/bin/Mail -s 'OUT-OF-SYNC SUR
LDAP/CAS !' xxxx at utc.fr";
}
net {
cram-hmac-alg sha1;
shared-secret
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
}
syncer {
rate 33M;
verify-alg sha1;
cpu-mask 1;
}
disk {
on-io-error detach;
fencing resource-only;
no-disk-flushes;
no-md-flushes;
}
startup {
degr-wfc-timeout 30;
}
protocol C;
}
# LDAP, MySQL, Apache
resource drbd0 {
device /dev/drbd0;
disk /dev/sda7;
meta-disk /dev/sda6[0];
on ldap-a {
address 192.168.0.1:7788;
}
on ldap-b {
address 192.168.0.2:7788;
}
}
# Tomcat, CAS
resource drbd1 {
device /dev/drbd1;
disk /dev/sda8;
meta-disk /dev/sda6[1];
syncer { after drbd0; }
on ldap-a {
address 192.168.0.1:7789;
}
on ldap-b {
address 192.168.0.2:7789;
}
}
-----------------------------------------------------------------------8<-------
A few other things of interest :
-kernel = 2.6.18-6-686-bigmem #1 SMP Fri Jun 6 23:31:15 UTC 2008 i686
-in /etc/sysctl.conf, I put :
----8<-------------------------------------------------------------------
# In case of a 'kernel panic' or 'oops', I want the node to reboot
# so that the resources are taken by the other node.
kernel.panic_on_oops = 1
kernel.panic = 1
------------------------------------------------------------------->8----
But it didn't reboot both times it crashed. Maybe these settings could
be a source of problems ?
-about the RAID card, I get this in DELL Open Manage Server Administrator :
Firmware version 6.0.2-0002
Driver version 00.00.03.01
Minimal required driver version 00.00.03.13
=>I first dismissed this warning, maybe I shouldn't... though I'm not
sure how to update this driver while keeping the stock Debian kernel.
Maybe I could find a new module...
-drbd0 contains live databases for OpenLDAP and MySQL, perhaps these
could have the usage patterns you refer to ? They are not often
modified, though : only a few entries a day.
> possibly, but I'd say unlikely, until proven otherwise (by an oops stack
> trace for example).
I'm not used to this. Do I need to recompile the kernel to get this
stack trace ?
I'd say the additional load of the verify caused
> some unexpected memory pressure/io-load, and you box was unable to
> handle that. serialize your resources, reduce the syncer rate, maybe
> reduce "max-buffers".
By serializing resources, do you mean this for drbd1 ? : syncer { after
drbd0; }
Syncer rate is 33M for now, as suggested by the official documentation
for a gigabit connection.
What would you suggest for max-buffers : 32, which is the minimum ?
Again, thank you for your answer : I'm still worried, but I have a few
things to test now.
If you need any other piece of information, please don't hesitate !
Eric