[DRBD-user] DRBD failed - went to 'stale'.

Mon Dec 10 14:10:30 CET 2007

Ben,

from what little amount of information you have given, I can only guess what 
issue you really ran into. But what you have shared sounds a bit like a 
resource starvation deadlock issue that was fixed in 8.0.8. To work around in 
an 8.0.6 cluster, see if changing max-buffers to something considerably 
higher than the default causes the stall to disappear. Try 40000 (five 
zeroes). Upgrading to 8.0.8 is recommended, though.

But as I said, you didn't give much information, so I'm reduced to guessing. 
If the above suggestion doesn't work, please post a full description of your 
issue, including your /etc/drbd.conf, and pertinent log snippets.

But, there appear to be some misconceptions about DRBD in place here that I'd 
like to clarify.

> >> I am running 8.0.6.  I had a complete failure of a server.  On reboot
> >> both my drbd nodes started a re-sync, and then jumped to 'stale', where
> >> they stuck indefinitely.

> This is correct is was 'stalled'.  (In my panic to get our servers
> running, I didn't take a copy of /proc/drbd at the time :)

"Both nodes started a re-sync" is the wrong wording here. _DRBD_ started a 
resync, which means one node became SyncSource and the other SyncTarget. 
Which in turn means that only the data on the SyncTarget is considered 
Inconsistent.

The SyncSource which has the UpToDate disk is perfectly usable at this time. 
You can make it Primary, mount it, run your application as normal. Thus, 
there is no reason to panic "to get servers running" again. You can run your 
DRBD-enabled services while the resync is in progress -- even if it's in fact 
not progressing. :-)

Hope this helps.

Cheers,
Florian

-- 
: Florian G. Haas 
: LINBIT Information Technologies GmbH
: Vivenotgasse 48, A-1120 Vienna, Austria