[Drbd-dev] problems connecting with mix of 9.0.26-rc3, 9.0.24-1, and 9.0.25-1

Michael Labriola michael.d.labriola at gmail.com
Mon Dec 14 16:36:20 CET 2020


I just tried upgrading 1 node of a 3 node cluster to 9.0.26-rc3 (w/ patches
for 5.10 kernel support).  The other two nodes are running 9.0.24-1 and
9.0.25-1.  The upgraded node comes up w/out any noticeable complaints, but
stays in Connecting forever.  Switching that node back to 9.0.25-1 connects
instantly.

On the 9.0.26-rc3 node I see the following kernel messages:

Dec 14 10:21:40 gimli kernel: drbd test1: Starting worker thread (from
drbdsetup [852])
Dec 14 10:21:40 gimli kernel: drbd test1 legolas: Starting sender thread
(from drbdsetup [878])
Dec 14 10:21:40 gimli kernel: drbd test1 boromir: Starting sender thread
(from drbdsetup [906])
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: meta-data IO uses: blk-bio
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: disk( Diskless ->
Attaching )
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: Maximum number of peer
devices = 2
Dec 14 10:21:40 gimli kernel: drbd test1: Method to ensure write ordering:
drain
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: drbd_bm_resize called
with capacity == 10485048
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: resync bitmap:
bits=1310631 words=40958 pages=80
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: size = 5120 MB (5242524
KB)
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: size = 5120 MB (5242524
KB)
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: recounting of set bits
took additional 0ms
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: disk( Attaching ->
UpToDate )
Dec 14 10:21:40 gimli kernel: drbd test1/0 drbd1: attached to current UUID:
4E68D1E7371B03E8
Dec 14 10:21:40 gimli kernel: drbd test1 legolas: conn( StandAlone ->
Unconnected )
Dec 14 10:21:40 gimli kernel: drbd test1 legolas: Starting receiver thread
(from drbd_w_test1 [853])
Dec 14 10:21:40 gimli kernel: drbd test1 legolas: conn( Unconnected ->
Connecting )
Dec 14 10:21:40 gimli kernel: drbd test1 boromir: conn( StandAlone ->
Unconnected )
Dec 14 10:21:40 gimli kernel: drbd test1 boromir: Starting receiver thread
(from drbd_w_test1 [853])
Dec 14 10:21:40 gimli kernel: drbd test1 boromir: conn( Unconnected ->
Connecting )
Dec 14 10:21:41 gimli kernel: drbd test1 boromir: Handshake to peer 2
successful: Agreed network protocol version 117
Dec 14 10:21:41 gimli kernel: drbd test1 boromir: Feature flags enabled on
protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Dec 14 10:21:41 gimli kernel: drbd test1 boromir: Starting ack_recv thread
(from drbd_r_test1 [1059])
Dec 14 10:21:41 gimli kernel: drbd test1/0 drbd1 boromir:
drbd_sync_handshake:
Dec 14 10:21:41 gimli kernel: drbd test1/0 drbd1 boromir: self
4E68D1E7371B03E8:0000000000000000:09DEB39F8943FFBE:0000000000000000 bits:0
flags:20
Dec 14 10:21:41 gimli kernel: drbd test1/0 drbd1 boromir: peer
4E68D1E7371B03E8:0000000000000000:53AEFD79E3271354:8B8B34C634B7A922 bits:0
flags:120
Dec 14 10:21:41 gimli kernel: drbd test1/0 drbd1 boromir:
uuid_compare()=no-sync by rule 38
Dec 14 10:21:41 gimli kernel: drbd test1: Preparing cluster-wide state
change 2702145072 (0->2 499/146)
Dec 14 10:21:41 gimli kernel: drbd test1 legolas: Handshake to peer 1
successful: Agreed network protocol version 117
Dec 14 10:21:41 gimli kernel: drbd test1 legolas: Feature flags enabled on
protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Dec 14 10:21:41 gimli kernel: drbd test1 legolas: Starting ack_recv thread
(from drbd_r_test1 [1056])
Dec 14 10:21:41 gimli kernel: drbd test1/0 drbd1 legolas:
drbd_sync_handshake:
Dec 14 10:21:41 gimli kernel: drbd test1/0 drbd1 legolas: self
4E68D1E7371B03E8:0000000000000000:09DEB39F8943FFBE:0000000000000000 bits:0
flags:20
Dec 14 10:21:41 gimli kernel: drbd test1/0 drbd1 legolas: peer
4E68D1E7371B03E8:0000000000000000:893B794351326FE2:F8549503131EA576 bits:0
flags:120
Dec 14 10:21:41 gimli kernel: drbd test1/0 drbd1 legolas:
uuid_compare()=no-sync by rule 38
Dec 14 10:22:11 gimli kernel: drbd test1: Aborting cluster-wide state
change 2702145072 (30131ms) rv = -23
Dec 14 10:22:11 gimli kernel: drbd test1: Preparing cluster-wide state
change 3732554560 (0->1 499/146)
Dec 14 10:22:41 gimli kernel: drbd test1: Aborting cluster-wide state
change 3732554560 (30187ms) rv = -23
Dec 14 10:23:13 gimli kernel: drbd test1: Preparing cluster-wide state
change 2057843657 (0->2 499/146)
Dec 14 10:23:13 gimli kernel: drbd test1: Aborting cluster-wide state
change 2057843657 (30272ms) rv = -23
Dec 14 10:23:15 gimli kernel: drbd test1: Preparing cluster-wide state
change 2843902071 (0->1 499/146)
Dec 14 10:23:45 gimli kernel: drbd test1: Aborting cluster-wide state
change 2843902071 (30208ms) rv = -23
Dec 14 10:24:17 gimli kernel: drbd test1: Preparing cluster-wide state
change 1126537323 (0->2 499/146)
Dec 14 10:24:17 gimli kernel: drbd test1: Aborting cluster-wide state
change 1126537323 (30208ms) rv = -23

On the other 2 nodes, I see variations of this:

Dec 14 10:21:41 legolas kernel: drbd test1 boromir: Preparing remote state
change 2702145072
Dec 14 10:21:41 legolas kernel: drbd test1 gimli: Handshake to peer 0
successful: Agreed network protocol version 117
Dec 14 10:21:41 legolas kernel: drbd test1 gimli: Feature flags enabled on
protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Dec 14 10:21:41 legolas kernel: drbd test1 gimli: Starting ack_recv thread
(from drbd_r_test1 [1182])
Dec 14 10:21:41 legolas kernel: drbd test1/0 drbd1 gimli:
drbd_sync_handshake:
Dec 14 10:21:41 legolas kernel: drbd test1/0 drbd1 gimli: self
4E68D1E7371B03E8:0000000000000000:893B794351326FE2:F8549503131EA576 bits:0
flags:120
Dec 14 10:21:41 legolas kernel: drbd test1/0 drbd1 gimli: peer
4E68D1E7371B03E8:0000000000000000:09DEB39F8943FFBE:0000000000000000 bits:0
flags:20
Dec 14 10:21:41 legolas kernel: drbd test1/0 drbd1 gimli:
uuid_compare()=no-sync by rule 38
Dec 14 10:22:11 legolas kernel: drbd test1 boromir: Aborting remote state
change 2702145072
Dec 14 10:22:11 legolas kernel: drbd test1 gimli: Preparing remote state
change 3732554560
Dec 14 10:22:41 legolas kernel: drbd test1 gimli: Aborting remote state
change 3732554560
Dec 14 10:22:43 legolas kernel: drbd test1 boromir: Preparing remote state
change 2057843657
Dec 14 10:23:13 legolas kernel: drbd test1: Two-phase commit 2057843657
timeout
Dec 14 10:23:13 legolas kernel: drbd test1 boromir: Ignoring P_TWOPC_ABORT
packet 2057843657.
Dec 14 10:23:15 legolas kernel: drbd test1 gimli: Preparing remote state
change 2843902071
Dec 14 10:23:45 legolas kernel: drbd test1 gimli: Aborting remote state
change 2843902071
Dec 14 10:23:47 legolas kernel: drbd test1 boromir: Preparing remote state
change 1126537323
Dec 14 10:24:17 legolas kernel: drbd test1 boromir: Aborting remote state
change 1126537323

Anyone seeing this kind of thing with the latest rc?

-- 
Michael D Labriola
21 Rip Van Winkle Cir
Warwick, RI 02886
401-316-9844 (cell)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20201214/b5ee32be/attachment.htm>


More information about the drbd-dev mailing list