[DRBD-user] handling I/O errors scenarios

Wed May 18 14:51:54 CEST 2011

Hi,

Thanks for great explanation, it was very helpful for me.
My quick question below...

16.5.2011 16:45 Lars Ellenberg <lars.ellenberg at linbit.com>:

> On Tue, May 10, 2011 at 12:45:08PM +0200, motyllo8 wrote:
> > Hi,
> > 
> > 
> > I have cluster in Active/Passive configuration. Currently I am trying
> > to support situation when I/O errors occur. I noticed that in
> > drbd.conf default behaviour is halt node with failed disks.
> 
> Default behaviour?
> Certainly not.
> Example configuration maybe, for a certain use case,
> to point out what is possible to configure.
> 
> > This is a little bit brutal for me. What kind of other scenarios are
> > here taken into account, if any? 
> 
> Once a node reaches a state where it cannot service IO requests anymore,
> in certain deployments a fast node failure can improve overall service
> availability.
> 
> > I was considering only disconnecting replication for resources where
> > I/O errors occured and then promoting second node, but as I know this
> > is not possible when resource is in Diskless state (btw. why?):
> >
> > drbdadm disconnect myresource                                                                                                                                                                                                              
> > 0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk                                                                                                                                                       
> > Command 'drbdsetup 0 disconnect' terminated with exit code 17
> 
> A Primary, that has become Diskless (because IO errors caused it to
> detach, or because of an explicit detach), will refuse to be
> "gracefully" disconnected: Because that would cause the data to become
> unavailable.
> 
> A Diskless Primary, while still connected, will service application
> requests just fine via the other node.
> 
> So you can pick a convenient time to do a graceful switchover:
> stop the services, demote DRBD on the Diskless node,
> promote it on the other, and start services there.
> 
> Or, of course, you could fix the broken disk,
> and re-attach it to the Primary.
> 
> 
> If you (forcefully) disconnect a Diskless Primary, it can only fail all
> IO requests (there is no data to service them from), or block indefinitely.

So, as I understood it is possible to break connection forcefully? If yes, could you write how to do that? I take into account that this operation will break access to resource/data.

> That's why it does not let you do that gracefully.
> It just protects you from shooting yourself.
> 
> 

Thanks