[DRBD-user] drbd-0.7.0 left dead, dazed and confused after seeing -preX peer

Lars Ellenberg Lars.Ellenberg at linbit.com
Thu Jul 22 16:17:07 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2004-07-22 14:30:07 +0200
\ Andreas Schultz:
> Hi Lars,
> 
> I should have explained it a bit more detailed ...
> 
> On Thursday 22 July 2004 14:13, Lars Ellenberg wrote:
> > / 2004-07-22 13:42:49 +0200
> >
> > \ Andreas Schultz:
> > > Hi,
> > >
> > > While upgrading my system, i have encountered a situation where both
> > > peers are alive and well, but drbd can still not establish a connection
> > > without removing and reinserting the drbd modules first.
> > >
> > > This happend after i rebooted the second system with an older kernel
> > > which prompted the first system to terminate it's drbd receiver threads.
> >
> > that older kernel obviouly uses an older, incompatible, drbd (module)
> > version.
> 
> Sure, it did.
> 
> [...]
> 
> > > ...  here all receivers are gone and drbd needs to reloaded to start
> > > working again.
> >
> > are you sure? ... cat /proc/drbd ...
> 
> A 'ps -xaw | grep drbd' only shows the drbd_worker thread, all receivers are 
> missing.

which is correct for "StandAlone".

> A 'drbdadm connect' might have resolved the sitution, but i feel 
> that i should automagicly recover once both peers are running the same 
> version.

so you think DRBD should retry in a loop (as fast as your box can
realease, recreate, bind, and reconnect a tcp socket) to reconnect to
someone speaking an other language?
to me this sounds not like a good idea.

> I would need to reboot the second peer again to reproduce the exact same 
> situation to get the /proc/drbd output. Do you need it?

since there is no unexpected behaviour, nope.

> > it just goes into "StandAlone", because it recognized that the peer talks
> > some incompatible drbd protocol version, and that won't change when we
> > try a reconnect. So we still are "operational", i.e. you can access it
> > locally. but we won't connect again until operator tells us (drbdadm
> > connect) that the problem is resolved, typically by bringing the peers
> > drbd up to date.
> 
> Thats what i did. I rebooted the second peer with a new kernel and it was then 
> that the first peer did not reestablish the connection.

as explained, intentional behaviour.

> > > I have to admit that this is a situation that should not happen in
> > > production, but i would also argue that _nothing_ should leave a drbd
> > > peer in a state from which it can not recover automaticly.
> >
> > as the log says: peer runs incompatible version. so we can not talk to him.
> > how do you think we can automatically recover from that?
> 
> Of course not when both peers are on different version, but once the second 
> peer comes up with a compatible drbd version it should IMHO.

well, since there MUST have been operator intervention anyways (to
upgrade the other node), the operator is online anyways, knows that this
issue has been fixed, and can easily log into the standalone node,
and issue a drbdadm connect all. it's as easy as that.

> > btw, with 0.7.0, we introduced mentioned HandShake packet.
> > from now on, drbd 0.7 (and probably all further versions) will be able
> > to talk to drbd of protocol version (PRO_VERSION + [-1;0;1]), so you
> > will be able to do a rolling upgrade of the cluster when we feel the
> > need to change the protocal again somewhen.
> 
> nice.

 :)

	Lars Ellenberg



More information about the drbd-user mailing list