[DRBD-user] expected behaviour after primary crash and reconnect.

Peter Kruse pk at q-leap.com
Thu Mar 31 15:40:45 CEST 2005


Hello,

Thanks for your reply.

Bernd Schubert wrote:
> 
> This should't be neccessary at all. What does your logfiles say? Does it work 
> if you manually do a /etc/init.d/drdb restart (in very rare case we also have 
> to do this)? 
> Rebooting shouldn't be neccessary to basically test the situation. Just 
> stopping drbd on the master node, making the device primary on the failover 
> and starting drbd on the master node should be sufficient.
> 

When I do that it works as expected, but when instead I do "reboot" on
the primary, it will not automatically connect.  Instead the following
messages appeary on the new primary once the old one comes up again:

Mar 31 15:24:56 ha-beo-1 kernel: drbd0: drbd0_receiver [6076]: cstate 
WFConnection --> WFReportParams
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Handshake successful: DRBD 
Network Protocol version 74
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Connection established.
Mar 31 15:24:56 ha-beo-1 kernel: drbd1: drbd1_receiver [6084]: cstate 
WFConnection --> WFReportParams
Mar 31 15:24:56 ha-beo-1 kernel: drbd1: Handshake successful: DRBD 
Network Protocol version 74
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: I am(P): 
1:00000001:00000001:00000024:00000011:10
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Peer(S): 
1:00000001:00000001:00000025:00000010:10
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Current Primary shall become 
sync TARGET! Aborting to prevent data corruption.
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: drbd0_receiver [6076]: cstate 
WFReportParams --> StandAlone
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: error receiving ReportParams, l: 72!
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: worker terminated
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: asender terminated
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: drbd0_receiver [6076]: cstate 
StandAlone --> StandAlone
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Connection lost.
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: receiver terminated
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: Connection established.
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: I am(P): 
1:00000002:00000001:00000017:0000000c:10
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: Peer(S): 
1:00000002:00000001:00000018:0000000b:10
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: Current Primary shall become 
sync TARGET! Aborting to prevent data corruption.
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: drbd1_receiver [6084]: cstate 
WFReportParams --> StandAlone
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: error receiving ReportParams, l: 72!
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: worker terminated
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: asender terminated
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: drbd1_receiver [6084]: cstate 
StandAlone --> StandAlone
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: Connection lost.
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: receiver terminated

and these are the messages on the old primary:

Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbdsetup [1483]: cstate 
Unconfigured --> StandAlone
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbdsetup [1513]: cstate 
StandAlone --> Unconnected
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate 
Unconnected --> WFConnection
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbdsetup [1521]: cstate 
StandAlone --> Unconnected
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate 
Unconnected --> WFConnection
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate 
WFConnection --> WFReportParams
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Handshake successful: DRBD 
Network Protocol version 74
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Connection established.
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate 
WFConnection --> WFReportParams
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: Handshake successful: DRBD 
Network Protocol version 74
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: I am(S): 
1:00000001:00000001:00000025:00000010:10
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: Connection established.
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: I am(S): 
1:00000002:00000001:00000018:0000000b:10
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: Peer(P): 
1:00000002:00000001:00000017:0000000c:10
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate 
WFReportParams --> WFBitMapS
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Peer(P): 
1:00000001:00000001:00000024:00000011:10
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate 
WFReportParams --> WFBitMapS
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: meta connection shut down by peer.
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: asender terminated
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: sock_sendmsg returned -32
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate 
WFBitMapS --> BrokenPipe
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: short sent ReportBitMap 
size=4096 sent=0
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Secondary/Unknown --> 
Secondary/Primary
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: sock was shut down by peer
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate 
BrokenPipe --> BrokenPipe
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: short read expecting header on 
sock: r=0
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: worker terminated
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate 
BrokenPipe --> Unconnected
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Connection lost.
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate 
Unconnected --> WFConnection
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: meta connection shut down by peer.
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: sock_sendmsg returned -104
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate 
WFBitMapS --> BrokenPipe
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: short sent ReportBitMap 
size=4096 sent=2472
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: Secondary/Unknown --> 
Secondary/Primary
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: asender terminated
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: sock was shut down by peer
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate 
BrokenPipe --> BrokenPipe
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: short read expecting header on 
sock: r=0
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: worker terminated
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate 
BrokenPipe --> Unconnected
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: Connection lost.
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate 
Unconnected --> WFConnection


I can then do "drbdadm connect all" on the new primary and the sync
starts.  I understand that the old primary wanted to sync in the
wrong direction.  This is why I asked if it is the correct
procedure to just call "drbdadm connect <resource>" after
"primary <resource>"?  Or is there any other way to automate
this process?

	Peter



More information about the drbd-user mailing list