[DRBD-user] handling I/O errors scenarios

Lars Ellenberg lars.ellenberg at linbit.com
Wed May 18 16:45:49 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, May 18, 2011 at 02:51:54PM +0200, motyllo8 wrote:
> Hi,
> 
> Thanks for great explanation, it was very helpful for me.
> My quick question below...
> 
> 
> 16.5.2011 16:45 Lars Ellenberg <lars.ellenberg at linbit.com>:
> 
> > On Tue, May 10, 2011 at 12:45:08PM +0200, motyllo8 wrote:
> > > Hi,
> > > 
> > > 
> > > I have cluster in Active/Passive configuration. Currently I am trying
> > > to support situation when I/O errors occur. I noticed that in
> > > drbd.conf default behaviour is halt node with failed disks.
> > 
> > Default behaviour?
> > Certainly not.
> > Example configuration maybe, for a certain use case,
> > to point out what is possible to configure.
> > 
> > > This is a little bit brutal for me. What kind of other scenarios are
> > > here taken into account, if any? 
> > 
> > Once a node reaches a state where it cannot service IO requests anymore,
> > in certain deployments a fast node failure can improve overall service
> > availability.
> > 
> > > I was considering only disconnecting replication for resources where
> > > I/O errors occured and then promoting second node, but as I know this
> > > is not possible when resource is in Diskless state (btw. why?):
> > >
> > > drbdadm disconnect myresource                                                                                                                                                                                                              
> > > 0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk                                                                                                                                                       
> > > Command 'drbdsetup 0 disconnect' terminated with exit code 17
> > 
> > A Primary, that has become Diskless (because IO errors caused it to
> > detach, or because of an explicit detach), will refuse to be
> > "gracefully" disconnected: Because that would cause the data to become
> > unavailable.
> > 
> > A Diskless Primary, while still connected, will service application
> > requests just fine via the other node.
> > 
> > So you can pick a convenient time to do a graceful switchover:
> > stop the services, demote DRBD on the Diskless node,
> > promote it on the other, and start services there.
> > 
> > Or, of course, you could fix the broken disk,
> > and re-attach it to the Primary.
> > 
> > 
> > If you (forcefully) disconnect a Diskless Primary, it can only fail all
> > IO requests (there is no data to service them from), or block indefinitely.
> 
> 
> 
> So, as I understood it is possible to break connection forcefully? If
> yes, could you write how to do that? I take into account that this
> operation will break access to resource/data.


Starting with 8.3.10, there is drbdsetup disconnect --force.

Of course you can also always "forcefully disconnect" by plugging the
cable, playing with iptables, taking the interface down, or otherwise
;-)

> > That's why it does not let you do that gracefully.
> > It just protects you from shooting yourself.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list