[DRBD-user] Linstor tripple primary node recover problem

张林锋 linfeng.zhang at woqutech.com
Sat Dec 21 08:27:57 CET 2019


We’ve setup a three node cluster with linstor where all three nodes are primary.

The one of test is reboot one machine. After the machine comes up again, it won’t reconnect. It’s state remain in "Outdated" while it tries to connect to the other hosts (which are still both primary).

It does not matter what drbdadm command we execute,like drbdadm down,drbdadm up ,drbdadm disconnect ,drbdadm connect --discard-my-data, The state won’t change. 

The only thing which works as workaround is putting one of the other two primarys to secondary. After this the rebooted host will connect and start syncing. But in a real world scenario this is not practicable to down a resource on one of the both survivors.

What's the right way after node failure in a tripple primary setup?


linstor version : linstor 1.0.1; GIT-hash: d8c9a43d4eab20749132147ad61a2ee821645be2
drbd version : 9.0.18-1
We've done a lot of testing in this release, and we don't want to upgrade to a newer version.



satellite node log :
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Handshake to peer 1 successful: Agreed network protocol version 115
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Peer authenticated using 20 bytes HMAC
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Starting ack_recv thread (from drbd_r_voting [30419])
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Rejecting concurrent remote state change 3425499901 because of state change 697879097
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: sock was shut down by peer
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: conn( Connecting -> BrokenPipe )
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: ack_receiver terminated
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Terminating ack_recv thread
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Restarting sender thread
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Connection closed
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: conn( BrokenPipe -> Unconnected )
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: Restarting receiver thread
Dec 21 15:02:59 com30-dev kernel: drbd voting sto34-dev: conn( Unconnected -> Connecting )
Dec 21 15:02:59 com30-dev kernel: drbd redo sto34-dev: Handshake to peer 1 successful: Agreed network protocol version 115
Dec 21 15:02:59 com30-dev kernel: drbd redo sto34-dev: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Dec 21 15:02:59 com30-dev kernel: drbd redo sto34-dev: Peer authenticated using 20 bytes HMAC
Dec 21 15:02:59 com30-dev kernel: drbd redo sto34-dev: Starting ack_recv thread (from drbd_r_redo [30413])
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Handshake to peer 0 successful: Agreed network protocol version 115
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Peer authenticated using 20 bytes HMAC
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Starting ack_recv thread (from drbd_r_voting [30417])
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Rejecting concurrent remote state change 1767491989 because of state change 697879097
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: sock was shut down by peer
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: conn( Connecting -> BrokenPipe )
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: ack_receiver terminated
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Terminating ack_recv thread
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Restarting sender thread
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Connection closed
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: conn( BrokenPipe -> Unconnected )
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: Restarting receiver thread
Dec 21 15:02:59 com30-dev kernel: drbd voting sto31-dev: conn( Unconnected -> Connecting )
Dec 21 15:02:59 com30-dev kernel: drbd redo sto31-dev: Handshake to peer 0 successful: Agreed network protocol version 115
Dec 21 15:02:59 com30-dev kernel: drbd redo sto31-dev: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Dec 21 15:02:59 com30-dev kernel: drbd redo sto31-dev: Peer authenticated using 20 bytes HMAC
Dec 21 15:02:59 com30-dev kernel: drbd redo sto31-dev: Starting ack_recv thread (from drbd_r_redo [30411])

if you need any information ,please tell me.



More information about the drbd-user mailing list