Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Thanks for great explanation, it was very helpful for me. My quick question below... 16.5.2011 16:45 Lars Ellenberg <lars.ellenberg at linbit.com>: > On Tue, May 10, 2011 at 12:45:08PM +0200, motyllo8 wrote: > > Hi, > > > > > > I have cluster in Active/Passive configuration. Currently I am trying > > to support situation when I/O errors occur. I noticed that in > > drbd.conf default behaviour is halt node with failed disks. > > Default behaviour? > Certainly not. > Example configuration maybe, for a certain use case, > to point out what is possible to configure. > > > This is a little bit brutal for me. What kind of other scenarios are > > here taken into account, if any? > > Once a node reaches a state where it cannot service IO requests anymore, > in certain deployments a fast node failure can improve overall service > availability. > > > I was considering only disconnecting replication for resources where > > I/O errors occured and then promoting second node, but as I know this > > is not possible when resource is in Diskless state (btw. why?): > > > > drbdadm disconnect myresource > > 0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk > > Command 'drbdsetup 0 disconnect' terminated with exit code 17 > > A Primary, that has become Diskless (because IO errors caused it to > detach, or because of an explicit detach), will refuse to be > "gracefully" disconnected: Because that would cause the data to become > unavailable. > > A Diskless Primary, while still connected, will service application > requests just fine via the other node. > > So you can pick a convenient time to do a graceful switchover: > stop the services, demote DRBD on the Diskless node, > promote it on the other, and start services there. > > Or, of course, you could fix the broken disk, > and re-attach it to the Primary. > > > If you (forcefully) disconnect a Diskless Primary, it can only fail all > IO requests (there is no data to service them from), or block indefinitely. So, as I understood it is possible to break connection forcefully? If yes, could you write how to do that? I take into account that this operation will break access to resource/data. > That's why it does not let you do that gracefully. > It just protects you from shooting yourself. > > Thanks