[Drbd-dev] [bug] drbd 9: Receiver error
Goldwyn Rodrigues
rgoldwyn at suse.de
Fri Feb 20 10:12:36 CET 2015
Hi Philipp,
On 02/20/2015 01:41 AM, Philipp Marek wrote:
> Hi Goldwyn,
>
>> I compiled drbd-9.0 against Opensuse Tumbleweed and installed it [1].
>> I applied Linux kernel commit f730c848affc05fb7262574b06e0cd7e1fa96096 to
>> get it to compile against the latest (factory) kernels and should be the
>> same for the tumbleweed kernel (3.18.3 based) I am using to test.
>>
>> I am getting the following receiver errors in the kernel log when I try to
>> start the drbd service:
> ...
>> Please let me know if you need any more information to debug this or if I am
>> doing something wrong.
> how many nodes did you connect, which DRBD versions were they running?
Two nodes, traditional way with one local device and no other clustering
software.
> Can you show the configuration, and some more log lines - from all nodes,
> and starting quite a bit earlier?
Here is the one which errs:
[ 175.644532] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110)
[ 175.644535] drbd: GIT-hash: 9804ed9b1eedab65cb137380f8066518a9521c12
build by abuild at cloud109, 2015-02-20 08:39:07
[ 175.644536] drbd: registered as block device major 147
[ 175.657174] drbd r0: Starting worker thread (from drbdsetup [1162])
[ 175.797012] sda: unknown partition table
[ 175.798367] block drbd0: disk( Diskless -> Attaching )
[ 175.798404] block drbd0: Maximum number of peer devices = 1
[ 175.798634] drbd r0: Method to ensure write ordering: flush
[ 175.798644] block drbd0: drbd_bm_resize called with capacity == 62912568
[ 175.799879] block drbd0: resync bitmap: bits=7864071 words=122877
pages=240
[ 175.799886] block drbd0: size = 30 GB (31456284 KB)
[ 175.830618] block drbd0: recounting of set bits took additional 0ms
[ 175.830633] block drbd0: Suspended AL updates
[ 175.830642] block drbd0: disk( Attaching -> Inconsistent )
[ 175.830645] block drbd0: attached to current UUID: 0000000000000004
[ 175.843022] drbd r0 tumbleweed1: conn( StandAlone -> Unconnected )
[ 175.843069] drbd r0 tumbleweed1: Starting sender thread (from
drbdsetup [1175])
[ 175.880303] drbd r0 tumbleweed1: Starting receiver thread (from
drbd_w_r0 [1163])
[ 175.880486] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[ 176.380200] drbd r0 tumbleweed1: Handshake successful: Agreed network
protocol version 110
[ 176.380205] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[ 176.380218] drbd r0 tumbleweed1: Starting asender thread (from
drbd_r_r0 [1183])
[ 176.392180] drbd r0 tumbleweed1: Preparing remote state change
1900523070 (primary_nodes=0, weak_nodes=0)
[ 176.392697] drbd r0 tumbleweed1: Committing remote state change
1900523070
[ 176.392718] drbd r0 tumbleweed1: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 176.400177] block drbd0: tumbleweed1: drbd_sync_handshake:
[ 176.400183] block drbd0: tumbleweed1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:0
[ 176.400188] block drbd0: tumbleweed1: peer
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:20
[ 176.400191] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[ 176.400195] block drbd0: tumbleweed1: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 176.421355] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate )
repl( Off -> WFBitMapT )
[ 176.421366] block drbd0: Resumed AL updates
[ 176.429715] block drbd0: tumbleweed1: bitmap overflow (e:8388342)
while decoding bm RLE packet
[ 176.429739] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP,
e: -5 l: 7!
[ 176.429763] drbd r0 tumbleweed1: conn( Connected -> ProtocolError )
peer( Secondary -> Unknown )
[ 176.429768] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown )
repl( WFBitMapT -> Off )
[ 176.429805] drbd r0 tumbleweed1: asender terminated
[ 176.429807] drbd r0 tumbleweed1: Terminating asender thread
[ 176.456361] drbd r0 tumbleweed1: Connection closed
[ 176.456382] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected )
[ 176.456404] drbd r0 tumbleweed1: Restarting receiver thread
[ 176.456413] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[ 176.972179] drbd r0 tumbleweed1: Handshake successful: Agreed network
protocol version 110
[ 176.972185] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[ 176.972205] drbd r0 tumbleweed1: Starting asender thread (from
drbd_r_r0 [1183])
[ 176.984192] drbd r0 tumbleweed1: Preparing remote state change
2872485783 (primary_nodes=0, weak_nodes=0)
[ 176.984540] drbd r0 tumbleweed1: Committing remote state change
2872485783
[ 176.984560] drbd r0 tumbleweed1: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 176.992109] block drbd0: tumbleweed1: drbd_sync_handshake:
[ 176.992115] block drbd0: tumbleweed1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:0
[ 176.992120] block drbd0: tumbleweed1: peer
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:20
[ 176.992124] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[ 176.992127] block drbd0: tumbleweed1: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 177.003945] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate )
repl( Off -> WFBitMapT )
[ 177.012276] block drbd0: tumbleweed1: bitmap overflow (e:8388342)
while decoding bm RLE packet
[ 177.012300] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP,
e: -5 l: 7!
[ 177.012324] drbd r0 tumbleweed1: conn( Connected -> ProtocolError )
peer( Secondary -> Unknown )
[ 177.012328] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown )
repl( WFBitMapT -> Off )
[ 177.012365] drbd r0 tumbleweed1: asender terminated
[ 177.012369] drbd r0 tumbleweed1: Terminating asender thread
[ 177.036176] drbd r0 tumbleweed1: Connection closed
[ 177.036197] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected )
[ 177.036219] drbd r0 tumbleweed1: Restarting receiver thread
[ 177.036237] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[ 177.536161] drbd r0 tumbleweed1: Handshake successful: Agreed network
protocol version 110
[ 177.536165] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[ 177.536177] drbd r0 tumbleweed1: Starting asender thread (from
drbd_r_r0 [1183])
[ 177.544168] drbd r0 tumbleweed1: Preparing remote state change
1844200584 (primary_nodes=0, weak_nodes=0)
[ 177.544569] drbd r0 tumbleweed1: Committing remote state change
1844200584
[ 177.544588] drbd r0 tumbleweed1: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 177.548098] block drbd0: tumbleweed1: drbd_sync_handshake:
[ 177.548106] block drbd0: tumbleweed1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:0
[ 177.548113] block drbd0: tumbleweed1: peer
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:20
[ 177.548118] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[ 177.548122] block drbd0: tumbleweed1: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 177.561761] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate )
repl( Off -> WFBitMapT )
[ 177.570088] block drbd0: tumbleweed1: bitmap overflow (e:8388342)
while decoding bm RLE packet
[ 177.570109] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP,
e: -5 l: 7!
[the last messages start looping]
And the one which attempts to be primary
[ 148.615292] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110)
[ 148.615297] drbd: GIT-hash: 9804ed9b1eedab65cb137380f8066518a9521c12
build by abuild at cloud109, 2015-02-20 08:39:07
[ 148.615299] drbd: registered as block device major 147
[ 148.629013] drbd r0: Starting worker thread (from drbdsetup [1138])
[ 148.746514] sda: unknown partition table
[ 148.747150] block drbd0: disk( Diskless -> Attaching )
[ 148.747161] block drbd0: Maximum number of peer devices = 1
[ 148.747251] drbd r0: Method to ensure write ordering: flush
[ 148.747255] block drbd0: drbd_bm_resize called with capacity == 67106744
[ 148.748075] block drbd0: resync bitmap: bits=8388343 words=131068
pages=256
[ 148.748078] block drbd0: size = 32 GB (33553372 KB)
[ 148.774226] block drbd0: recounting of set bits took additional 0ms
[ 148.774247] block drbd0: Suspended AL updates
[ 148.774262] block drbd0: disk( Attaching -> UpToDate )
[ 148.774270] block drbd0: attached to current UUID: 2077E6DCE2ED8D5E
[ 148.826678] drbd r0 tumbleweed3: conn( StandAlone -> Unconnected )
[ 148.826719] drbd r0 tumbleweed3: Starting sender thread (from
drbdsetup [1151])
[ 148.880835] drbd r0 tumbleweed3: Starting receiver thread (from
drbd_w_r0 [1139])
[ 148.881016] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[ 156.921255] drbd r0 tumbleweed3: Handshake successful: Agreed network
protocol version 110
[ 156.921260] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[ 156.921275] drbd r0 tumbleweed3: Starting asender thread (from
drbd_r_r0 [1159])
[ 156.928136] drbd r0: Preparing cluster-wide state change 1900523070
(0->1 499/146)
[ 156.933242] drbd r0: State change 1900523070: primary_nodes=0,
weak_nodes=0
[ 156.933246] drbd r0: Committing cluster-wide state change 1900523070
(4ms)
[ 156.933263] drbd r0 tumbleweed3: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 156.936119] block drbd0: tumbleweed3: drbd_sync_handshake:
[ 156.936125] block drbd0: tumbleweed3: self
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:0
[ 156.936130] block drbd0: tumbleweed3: peer
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:24
[ 156.936133] block drbd0: tumbleweed3: uuid_compare()=2 by rule 30
[ 156.936136] block drbd0: tumbleweed3: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 156.963632] block drbd0: tumbleweed3: pdsk( DUnknown -> Inconsistent
) repl( Off -> WFBitMapS )
[ 156.963644] block drbd0: Resumed AL updates
[ 156.966625] block drbd0: tumbleweed3: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 156.985166] drbd r0 tumbleweed3: sock was shut down by peer
[ 156.985186] drbd r0 tumbleweed3: conn( Connected -> BrokenPipe )
peer( Secondary -> Unknown )
[ 156.985191] block drbd0: tumbleweed3: repl( WFBitMapS -> Off )
[ 156.985215] drbd r0 tumbleweed3: short read (expected size 16)
[ 156.985238] drbd r0 tumbleweed3: asender terminated
[ 156.985241] drbd r0 tumbleweed3: Terminating asender thread
[ 157.012359] drbd r0 tumbleweed3: Connection closed
[ 157.012380] drbd r0 tumbleweed3: conn( BrokenPipe -> Unconnected )
[ 157.012402] drbd r0 tumbleweed3: Restarting receiver thread
[ 157.012420] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[ 157.513134] drbd r0 tumbleweed3: Handshake successful: Agreed network
protocol version 110
[ 157.513138] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[ 157.513153] drbd r0 tumbleweed3: Starting asender thread (from
drbd_r_r0 [1159])
[ 157.520104] drbd r0: Preparing cluster-wide state change 2872485783
(0->1 499/146)
[ 157.525159] drbd r0: State change 2872485783: primary_nodes=0,
weak_nodes=0
[ 157.525162] drbd r0: Committing cluster-wide state change 2872485783
(4ms)
[ 157.525179] drbd r0 tumbleweed3: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 157.528106] block drbd0: tumbleweed3: drbd_sync_handshake:
[ 157.528111] block drbd0: tumbleweed3: self
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:0
[ 157.528115] block drbd0: tumbleweed3: peer
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:24
[ 157.528118] block drbd0: tumbleweed3: uuid_compare()=2 by rule 30
[ 157.528121] block drbd0: tumbleweed3: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 157.546946] block drbd0: tumbleweed3: repl( Off -> WFBitMapS )
[ 157.548668] block drbd0: tumbleweed3: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 157.561288] drbd r0 tumbleweed3: sock was shut down by peer
[ 157.561306] drbd r0 tumbleweed3: conn( Connected -> BrokenPipe )
peer( Secondary -> Unknown )
[ 157.561311] block drbd0: tumbleweed3: repl( WFBitMapS -> Off )
[ 157.561335] drbd r0 tumbleweed3: short read (expected size 16)
[ 157.561358] drbd r0 tumbleweed3: asender terminated
[ 157.561360] drbd r0 tumbleweed3: Terminating asender thread
[ 157.576230] drbd r0 tumbleweed3: Connection closed
[ 157.576251] drbd r0 tumbleweed3: conn( BrokenPipe -> Unconnected )
[ 157.576273] drbd r0 tumbleweed3: Restarting receiver thread
[ 157.576290] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[ 158.077114] drbd r0 tumbleweed3: Handshake successful: Agreed network
protocol version 110
[ 158.077119] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[ 158.077152] drbd r0 tumbleweed3: Starting asender thread (from
drbd_r_r0 [1159])
[ 158.084157] drbd r0: Preparing cluster-wide state change 1844200584
(0->1 499/146)
[ 158.085191] drbd r0: State change 1844200584: primary_nodes=0,
weak_nodes=0
[ 158.085196] drbd r0: Committing cluster-wide state change 1844200584
(0ms)
[ 158.085214] drbd r0 tumbleweed3: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
And the configs:
tumbleweed1:~ # cat /etc/drbd.conf
# You can find an example in /usr/share/doc/drbd.../drbd.conf.example
include "drbd.d/global_common.conf";
include "drbd.d/*.res";
I haven't turned on any optins in global_common.conf
tumbleweed1:~ # cat /etc/drbd.d/r0.res
resource r0 {
device /dev/drbd_r0 minor 0;
disk /dev/sda;
meta-disk internal;
on tumbleweed1 {
address 192.168.1.111:7788;
}
on tumbleweed3 {
address 192.168.1.113:7788;
}
syncer {
rate 7M;
}
}
> And then the DRBD 9 git HEAD has already moved again...
>
Yes, I rebuilt against the latest one and the error still occurs.
BTW, the latest git still needs to fix the f_dentry in
drbd/drbd_debugfs.c not covered in 7b00306c929c389bf030463466cb62e65bc415ec
IOW, there is still reference to f_dentry is drbd/drbd_debugfs.c
--
Goldwyn
More information about the drbd-dev
mailing list