[DRBD-user] expected behaviour after primary crash and reconnect.
Peter Kruse
pk at q-leap.com
Thu Mar 31 15:40:45 CEST 2005
Hello,
Thanks for your reply.
Bernd Schubert wrote:
>
> This should't be neccessary at all. What does your logfiles say? Does it work
> if you manually do a /etc/init.d/drdb restart (in very rare case we also have
> to do this)?
> Rebooting shouldn't be neccessary to basically test the situation. Just
> stopping drbd on the master node, making the device primary on the failover
> and starting drbd on the master node should be sufficient.
>
When I do that it works as expected, but when instead I do "reboot" on
the primary, it will not automatically connect. Instead the following
messages appeary on the new primary once the old one comes up again:
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: drbd0_receiver [6076]: cstate
WFConnection --> WFReportParams
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Handshake successful: DRBD
Network Protocol version 74
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Connection established.
Mar 31 15:24:56 ha-beo-1 kernel: drbd1: drbd1_receiver [6084]: cstate
WFConnection --> WFReportParams
Mar 31 15:24:56 ha-beo-1 kernel: drbd1: Handshake successful: DRBD
Network Protocol version 74
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: I am(P):
1:00000001:00000001:00000024:00000011:10
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Peer(S):
1:00000001:00000001:00000025:00000010:10
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Current Primary shall become
sync TARGET! Aborting to prevent data corruption.
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: drbd0_receiver [6076]: cstate
WFReportParams --> StandAlone
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: error receiving ReportParams, l: 72!
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: worker terminated
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: asender terminated
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: drbd0_receiver [6076]: cstate
StandAlone --> StandAlone
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: Connection lost.
Mar 31 15:24:56 ha-beo-1 kernel: drbd0: receiver terminated
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: Connection established.
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: I am(P):
1:00000002:00000001:00000017:0000000c:10
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: Peer(S):
1:00000002:00000001:00000018:0000000b:10
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: Current Primary shall become
sync TARGET! Aborting to prevent data corruption.
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: drbd1_receiver [6084]: cstate
WFReportParams --> StandAlone
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: error receiving ReportParams, l: 72!
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: worker terminated
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: asender terminated
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: drbd1_receiver [6084]: cstate
StandAlone --> StandAlone
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: Connection lost.
Mar 31 15:24:57 ha-beo-1 kernel: drbd1: receiver terminated
and these are the messages on the old primary:
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbdsetup [1483]: cstate
Unconfigured --> StandAlone
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbdsetup [1513]: cstate
StandAlone --> Unconnected
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate
Unconnected --> WFConnection
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbdsetup [1521]: cstate
StandAlone --> Unconnected
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate
Unconnected --> WFConnection
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate
WFConnection --> WFReportParams
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Handshake successful: DRBD
Network Protocol version 74
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Connection established.
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate
WFConnection --> WFReportParams
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: Handshake successful: DRBD
Network Protocol version 74
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: I am(S):
1:00000001:00000001:00000025:00000010:10
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: Connection established.
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: I am(S):
1:00000002:00000001:00000018:0000000b:10
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: Peer(P):
1:00000002:00000001:00000017:0000000c:10
Mar 31 15:24:56 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate
WFReportParams --> WFBitMapS
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Peer(P):
1:00000001:00000001:00000024:00000011:10
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate
WFReportParams --> WFBitMapS
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: meta connection shut down by peer.
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: asender terminated
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: sock_sendmsg returned -32
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate
WFBitMapS --> BrokenPipe
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: short sent ReportBitMap
size=4096 sent=0
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Secondary/Unknown -->
Secondary/Primary
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: sock was shut down by peer
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate
BrokenPipe --> BrokenPipe
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: short read expecting header on
sock: r=0
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: worker terminated
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate
BrokenPipe --> Unconnected
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: Connection lost.
Mar 31 15:24:56 ha-beo-2 kernel: drbd0: drbd0_receiver [1514]: cstate
Unconnected --> WFConnection
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: meta connection shut down by peer.
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: sock_sendmsg returned -104
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate
WFBitMapS --> BrokenPipe
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: short sent ReportBitMap
size=4096 sent=2472
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: Secondary/Unknown -->
Secondary/Primary
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: asender terminated
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: sock was shut down by peer
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate
BrokenPipe --> BrokenPipe
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: short read expecting header on
sock: r=0
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: worker terminated
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate
BrokenPipe --> Unconnected
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: Connection lost.
Mar 31 15:24:57 ha-beo-2 kernel: drbd1: drbd1_receiver [1522]: cstate
Unconnected --> WFConnection
I can then do "drbdadm connect all" on the new primary and the sync
starts. I understand that the old primary wanted to sync in the
wrong direction. This is why I asked if it is the correct
procedure to just call "drbdadm connect <resource>" after
"primary <resource>"? Or is there any other way to automate
this process?
Peter
More information about the drbd-user
mailing list