[DRBD-user] drbd-0.7.0 left dead, dazed and confused after seeing -preX peer

Lars Ellenberg Lars.Ellenberg at linbit.com
Thu Jul 22 14:13:56 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2004-07-22 13:42:49 +0200
\ Andreas Schultz:
> Hi,
> 
> While upgrading my system, i have encountered a situation where both peers are 
> alive and well, but drbd can still not establish a connection without 
> removing and reinserting the drbd modules first.

> This happend after i rebooted the second system with an older kernel which 
> prompted the first system to terminate it's drbd receiver threads.

that older kernel obviouly uses an older, incompatible, drbd (module) version.

> The log from the first peer:
> Jul 22 12:31:03 sdev01 kernel: drbd0: expected HandShake packet, received 
> ReportParams...
> Jul 22 12:31:03 sdev01 kernel: drbd0: peer probaly runs some incompatible 0.7 
> -preX version
> Jul 22 12:31:03 sdev01 kernel: drbd0: Discarding network configuration.
> Jul 22 12:31:04 sdev01 kernel: drbd1: expected HandShake packet, received 
> ReportParams...
> Jul 22 12:31:04 sdev01 kernel: drbd1: peer probaly runs some incompatible 0.7 
> -preX version
> Jul 22 12:31:04 sdev01 kernel: drbd1: Discarding network configuration.
> Jul 22 12:31:04 sdev01 kernel: drbd1: worker terminated
> Jul 22 12:31:04 sdev01 kernel: drbd1: Connection lost.
> Jul 22 12:31:04 sdev01 kernel: drbd1: receiver terminated
> Jul 22 12:31:04 sdev01 kernel: drbd2: expected HandShake packet, received 
> ReportParams...
> Jul 22 12:31:04 sdev01 kernel: drbd2: peer probaly runs some incompatible 0.7 
> -preX version
> Jul 22 12:31:04 sdev01 kernel: drbd2: Discarding network configuration.
> Jul 22 12:31:04 sdev01 kernel: drbd2: worker terminated
> Jul 22 12:31:04 sdev01 kernel: drbd2: Connection lost.
> Jul 22 12:31:04 sdev01 kernel: drbd2: receiver terminated
> Jul 22 12:31:04 sdev01 kernel: drbd0: worker terminated
> Jul 22 12:31:04 sdev01 kernel: drbd0: Connection lost.
> Jul 22 12:31:04 sdev01 kernel: drbd0: receiver terminated
> 
> ...  here all receivers are gone and drbd needs to reloaded to start working 
> again.

are you sure? ... cat /proc/drbd ...

it just goes into "StandAlone", because it recognized that the peer talks
some incompatible drbd protocol version, and that won't change when we
try a reconnect. So we still are "operational", i.e. you can access it
locally. but we won't connect again until operator tells us (drbdadm
connect) that the problem is resolved, typically by bringing the peers
drbd up to date.

> I have to admit that this is a situation that should not happen in production, 
> but i would also argue that _nothing_ should leave a drbd peer in a state 
> from which it can not recover automaticly.

as the log says: peer runs incompatible version. so we can not talk to him.
how do you think we can automatically recover from that?

btw, with 0.7.0, we introduced mentioned HandShake packet.
from now on, drbd 0.7 (and probably all further versions) will be able
to talk to drbd of protocol version (PRO_VERSION + [-1;0;1]), so you
will be able to do a rolling upgrade of the cluster when we feel the
need to change the protocal again somewhen.

	Lars Ellenberg



More information about the drbd-user mailing list