Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sun, 2004-09-19 at 20:03, Lars Ellenberg wrote:
> / 2004-09-18 11:42:24 -0400
> \ Tony Willoughby:
> > On Wed, 2004-09-15 at 11:57, Todd Denniston wrote:
> > > Tony Willoughby wrote:
> > > >
> > > > Greetings,
> > > >
> > > > We've had an incident that I am trying to understand.
> > > >
> > > > Configuration:
> > > > Two IBM E-Server x330's running Heartbeat/DRBD (0.6.4).
> > > > Redhat 7.3
> > > > Protocol C
> > > > Crossover Ethernet
> > > >
> > > > (I know that 0.6.4 is old, but we have a rather staggered release
> > > > cycle and our customers tend to upgrade infrequently.)
> > > >
> > > > At some point the secondary machine started reporting SCSI errors (the
> > > > disk eventually failed). It is not known how long the system was
> > > > having these errors.
> > > >
> > > > The primary machine started to become unresponsive.
> > > >
> > > > Here is the odd thing: Any command that accessed the filesystem above
> > > > DRBD (e.g. "ls /the/mirrored/partition") would hang. Once the
> > > > secondary was shutdown the commands that were hung suddenly
> > > > completed.
> > > >
> > > > I'm not necessarily looking for a fix (although if I were told this
> > > > was fixed in a latter release you'd make my day :^), I'm trying to
> > > > understand why this would happen.
> > > >
> > > > Anyone have any ideas?
> > > Note: I am a user not a writer of drbd, and I have some Promise raid boxes
> > > that put me in the above situation ALL too often.
> > >
> > > 0.6.10 behaves the same way.
> > > Proto C requires that before the primary returns "data written", both host's
> > > subsystems have to return "data written". IIRC ls (and many other commands)
> > > at a minimum may end up updating things like access time on some
> > > file/directory entries, that's a write that requires a "data written" on both
> > > systems, so you get to wait until Proto C is satisfied.
> >
> > A failing secondary bringing down a primary kind of defeats the whole
> > purpose of redundancy. :^)
> >
> > Any developers care to comment on this? Would protocol B be a better
> > choice with respect to increasing the availability of the cluster?
> > Would the mount flag "sync" be required with protocol B?
> >
> > See this thread for my experience of the sync flag and the reason that I
> > switched to protocol C in the first place:
> >
> > http://sourceforge.net/mailarchive/message.php?msg_id=5668764
> >
> > Would moving to the 0.7 code base make things better?
> >
> > I'm very concerned about this issue. My customer would have had a more
> > available system with just one node.
>
> no you don't want to use anything but protocol C if you care about
> transactions (and even a mere journalling file system does...)
>
> for an HA system you also need monitoring. you monitor the box,
> you see it has problems, you take it down (out of the cluster at least).
> and if you had configured it for panic on lower level io error, it
> should have taken down itself...
Here is my configuration, is "do-panic" what you are refereeing to? I
have that enabled.
resource drbd0 {
protocol=C
fsckcmd=fsck -p -y
inittimeout=180
disk {
do-panic
disk-size=2048256
}
net {
sync-rate=5000
tl-size=5000
timeout=60
connect-int=10
ping-int=10
}
on basfbpm-1 {
device=/dev/nb0
disk=/dev/sda5
address=192.0.2.2
port=7788
}
on basfbpm-2 {
device=/dev/nb0
disk=/dev/sda5
address=192.0.2.1
port=7788
}
}
>
> since 0.6.10 or .12, we have the ko-count.
> yes we have it in 0.7, too.
Excellent. I will dig into ko-count. Thanks for the tip.
>
> what it means is: if we cannot get any data transfered to our peer,
> but it still answeres to "drbd ping packets", we normally would retry
> (actually, tcp will do the retry for us), or continue to wait for ACK
> packets. but we start the ko count down. once this counter hits zero, we
> consider the peer dead even though it is still partially responsive, and
> we do not try to connect there again until explicitly told to do so.
Any tips on how to tune the ko-count?
Any tips on how to simulate a failing disk in the lab?
>
> however, if your secondary just becomes very slooow and not fail
> completely, this mechanism will not work and indeed slow down the
> primary, too. sorry about that.
> btw, linbit takes sponsors and support contracts.
> if you don't think you need our support,
> think of it as you supporting us instead!
We have! :^)
My company had a service contract with Linbit for several years.
Thanks for your input Lars.
>
> and yes, 0.7. improves here too, because it has the concept of
> "NegAck"s and "detaching" the lower level device on io errors,
> continuing in "diskless" mode. which makes it possible for your
> monitoring system to do a _graceful_ failover once it recognizes that
> the primary went into diskless state because of underlying io errors.
>
> we are better than you think...
> but we have to improve our documentation obviously.
>
> Lars Ellenberg
--
Tony Willoughby
Bigband Networks
mailto:tony.willoughby at bigbandnet.com