[DRBD-user] can drbd be made to detect that it has failed to write to the underlying device in a 'long time'?

Todd Denniston Todd.Denniston at ssa.crane.navy.mil
Thu Apr 15 16:33:18 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars Ellenberg wrote:
> 
> / 2004-04-14 15:09:10 -0500
> \ Todd Denniston:
> > 
<SNIP>
> > BTW When drbd puts the kernel, on the secondary machine, into
> > kernel-panic and you bring the secondary machine back up...  is a
> > quick sync really enough???
> 
> good question.
>   probably not, and it should be written into the meta data first, so it
> will ensure a full sync, or maybe even refuse to do anything until
> operator confirmed: "yes, hardware *is* ok again.", and only then do the
> full sync.
> 
> Philipp and others, please comment.

drbd would probably need to write something to /var/lib/drbd/afile, indicating
it was about to cause a kernel panic, so it would know it did that when the
system came back up.
BUT, it would need to sync the /var/lib/drbd/afile just before calling the
panic ... and that might block if the dev for /var/lib/drbd/ happend to be
having problems too (spawn off a separate sync job, give it a half second?).

> 
> > or should I bring drbd down and `rm /var/lib/drbd/* -f`, and then bring drbd
> > back up forcing a full sync? I have been doing the rm. (for what I think is to
> > keep my sanity).
> 
> you can also request a full sync by "drbdsetup /dev/nbX replicate"
> ...
> 

Do that as root on logged in on the box that went down, correct? (just want to
make sure I get the sync direction correct.)

<SNIP>
> No, I won't go and try to write some autodetection mechanism.
> But this may be an interessting additional config thing for drbd 0.7.
> Similar to "sync-group", we can have a "lo-dev-group":
> This makes sense, since DRBD 0.7. does not panic by default, but
> goes into "diskless" mode. If several devices share the same lo-dev,
> each of them in turn would need to wait for the hw-driver retry timeout
> cycle to recognize what we could have known already...

That sounds like a good idea to me.

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane) 
Harnessing the Power of Technology for the Warfighter



More information about the drbd-user mailing list