[DRBD-user] drbdadm verify all seems to produce false positives on ext3 and crash the server

Tue Jun 24 10:42:37 CEST 2008

Thanks a lot for this quick and long answer !

Lars Ellenberg wrote :
> On Mon, Jun 23, 2008 at 04:32:25PM +0200, Eric Marin wrote:
>> Hello and sorry about the length of this report,
> 
> I have a few comments, below.
> 
(...)
> this does not really help "after the fact".
> it would be of interesst to see wether this is in file "payload data"
> area, or in file system "meta data" area (allocation bitmaps and such).
> 
> I don't know from the top of my head how to find out for ext3,
> but I remember it was not too difficult.
I'm sorry, I'm not sure I understand...

> there are a few basic ways how data can end up being different on the
> two replicas. let us leave out additional modes that could be seen when
> hard node crashes/power cycles/etc are involved, but focus only on what
> could happen during "normal" operation.
>   - hardware problems, like broken RAM
>   - a bug in drbd
>   - a bug in other drivers
>   - someone fiddling around with direct access or otherwise bypassing
>     drbd
> 
A consistency check on the RAID array didn't detect any error, so I 
suppose they are clean. Right now , I'm running MemTest86+.
Maybe the crossover ethernet cable is simply bad (!), but I don't think 
this would explain the crashes.
I did this once and got identical checksums :
ldap-a:~# dd if=/dev/urandom of=/tmp/foobar bs=1M count=256
ldap-a:~# md5sum /tmp/foobar
ldap-b:~# netcat -l -p 4711 | md5sum
ldap-a:~# netcat -q0 192.168.0.2 4711 < /tmp/foobar

> maybe first enable the "integrity checking",
> see if that detects something,
> then disable the checksum offloading,
> see if it still occurs,
> if not leave offloading disabled,
> but disable again the "integrity checking" for performance reasons.
OK, I'm going to try that next.

>> ldap-a has crashed twice in three weeks,
> 
> what exactly does "crashed" mean?
> just "unresponsive"? panic? oops? BUG()?
> any logs? anything from the console?
The screen is completely black and the keyboard unresponsive. Even 
Alt+Print+B doesn't reboot the server, I have to press the power button 
for a few seconds. I can't check right now, but the first time it 
crashed, I couldn't find anything in the logs, except for out-of-sync 
warnings and a corrupted FS on a partition mounted by DRBD.
Only ldap-a has crashed for now, but drbdadm verify all was always 
executed on ldap-a.

> oh, and in that case (crashed a few times),
> we need to consider more failure modes:
> is there any volatile disk cache involved?
> is no-disk-flushes set?
> is no-md-flushes set?
Do you mean volatile disk cache on the RAID card ?
I enabled no-disk-flushes and no-md-flushes so as not to pollute logs 
with "local disk flush failed", I was getting lots of that. I think this 
is safe with a hardware RAID card with battery-backed cache (PERC 6i), 
isn't it (I hope so !) ?
The write policy is : write back.

Contents of /etc/drbd.conf :
--------8<----------------------------------------------------------------------
global {
	usage-count no;
}

common {
	handlers {
		outdate-peer "echo 'Cable croise debranche entre ldap-a et ldap-b ?' | 
/usr/bin/Mail -s 'OUTDATE-PEER SUR LDAP/CAS !' xxxx at utc.fr & 
/usr/lib/heartbeat/drbd-peer-outdater";

		split-brain "echo 'DRBD a detecte une situation de split-brain. 
Intervention manuelle necessaire sur ldap-a et ldap-b !' | /usr/bin/Mail 
-s 'SPLIT-BRAIN SUR LDAP/CAS !' xxxx at utc.fr";

		out-of-sync "echo 'DRBD est desynchronise. Intervention manuelle 
necessaire sur ldap-a et ldap-b !' | /usr/bin/Mail -s 'OUT-OF-SYNC SUR 
LDAP/CAS !' xxxx at utc.fr";
	}

	net {
		cram-hmac-alg sha1;
		shared-secret
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
	}

	syncer {
		rate 33M;
		verify-alg sha1;
		cpu-mask 1;
	}

	disk {
		on-io-error detach;
		fencing resource-only;
		no-disk-flushes;
		no-md-flushes;
	}

	startup {
		degr-wfc-timeout 30;
	}

	protocol C;
}

# LDAP, MySQL, Apache
resource drbd0 {
	device /dev/drbd0;
	disk /dev/sda7;
	meta-disk /dev/sda6[0];

	on ldap-a {
		address 192.168.0.1:7788;
	}

	on ldap-b {
		address 192.168.0.2:7788;
	}
}

# Tomcat, CAS
resource drbd1 {
	device /dev/drbd1;
	disk /dev/sda8;
	meta-disk /dev/sda6[1];
	syncer { after drbd0; }

	on ldap-a {
		address 192.168.0.1:7789;
	}

	on ldap-b {
		address 192.168.0.2:7789;
	}
}
-----------------------------------------------------------------------8<-------

A few other things of interest :
-kernel = 2.6.18-6-686-bigmem #1 SMP Fri Jun 6 23:31:15 UTC 2008 i686

-in /etc/sysctl.conf, I put :
----8<-------------------------------------------------------------------
# In case of a 'kernel panic' or 'oops', I want the node to reboot
# so that the resources are taken by the other node.
kernel.panic_on_oops = 1
kernel.panic = 1
------------------------------------------------------------------->8----
But it didn't reboot both times it crashed. Maybe these settings could 
be a source of problems ?

-about the RAID card, I get this in DELL Open Manage Server Administrator :
Firmware version 6.0.2-0002
Driver version 00.00.03.01
Minimal required driver version 00.00.03.13
=>I first dismissed this warning, maybe I shouldn't... though I'm not 
sure how to update this driver while keeping the stock Debian kernel. 
Maybe I could find a new module...

-drbd0 contains live databases for OpenLDAP and MySQL, perhaps these 
could have the usage patterns you refer to ? They are not often 
modified, though : only a few entries a day.

> possibly, but I'd say unlikely, until proven otherwise (by an oops stack
> trace for example).
I'm not used to this. Do I need to recompile the kernel to get this 
stack trace ?

   I'd say the additional load of the verify caused
> some unexpected memory pressure/io-load, and you box was unable to
> handle that. serialize your resources, reduce the syncer rate, maybe
> reduce "max-buffers".
By serializing resources, do you mean this for drbd1 ? : syncer { after 
drbd0; }
Syncer rate is 33M for now, as suggested by the official documentation 
for a gigabit connection.

What would you suggest for max-buffers : 32, which is the minimum ?

Again, thank you for your answer : I'm still worried, but I have a few 
things to test now.

If you need any other piece of information, please don't hesitate !
Eric