[DRBD-user] Extent XXX beyond end of bitmap!

Oleksiy Evin o.evin at onefc.com
Tue Oct 2 05:50:25 CEST 2018


The above 'Extent XXX beyond end of bitmap!' error is constantly reproduced on our environment.  That's not clear what was exactly the trigger, but that happened when peacemaker were unable to properly failover to another node due to DRBD timeout issue following by the server reset. 

# drbdadm statussg-master-drbd role:Secondary  disk:Diskless  peer role:Primary    replication:Established peer-disk:UpToDate
# drbdadm up allextent 19136507 beyond end of bitmap!extent 21495810 beyond end of bitmap!extent 21785161 beyond end of bitmap!... another 50+ entries similar to above...../shared/drbdmeta.c:2279:apply_al: ASSERT(bm_pos - bm_on_disk_pos <= chunk - extents_size) failed.sg-master-drbd: Failure: (102) Local address(port) already in use.Command 'drbdsetup-84 connect sg-master-drbd ipv4:172.16.2.10:7801 ipv4:172.16.2.20:7801 --protocol=C --max-buffers=64K --sndbuf-size=1024K --after-sb-0pri=discard-younger-primary --after-sb-1pri=discard-secondary --after-sb-2pri=call-pri-lost-after-sb' terminated with exit code 10
]# drbdadm attach allextent 19136507 beyond end of bitmap!extent 21495810 beyond end of bitmap!extent 21785161 beyond end of bitmap!...../shared/drbdmeta.c:2279:apply_al: ASSERT(bm_pos - bm_on_disk_pos <= chunk - extents_size) failed.
Previously we fixed that by recreating DRBD meta data and fully resynchronize the nodes, which obviously is incorrect way to handle it.

The configuration is pretty much standard, with Internal meta data and defaults for AL and max-peers.

resource master-drbd {  net {    protocol C;    max-buffers    64K;    sndbuf-size    1024K;    after-sb-0pri  discard-younger-primary;    after-sb-1pri  discard-secondary;    after-sb-2pri  call-pri-lost-after-sb;  }  disk {    resync-rate 4000M;    disk-barrier no;    disk-flushes no;    c-plan-ahead 0;    read-balancing 1M-striping;  }  volume 0 {    disk /dev/drbdpool/data;    device /dev/drbd0;    meta-disk internal;  }  on hcluster01 {    address 172.16.2.10:7801;  }  on hcluster02 {    address 172.16.2.20:7801;  }}

I'm not able to get 'drbdadm dump-md' with the following error:

# drbdadm dump-md allFound meta data is "unclean", please apply-al firstCommand 'drbdmeta 0 v08 /dev/drbdpool/data internal dump-md' terminated with exit code 255
Backend device 'dm-3' for DRBD is a logical volume 'data' which combines two Hardware RAID0 arrays (sda, sdb) by volume group 'drbdpool'. 

Reported sizes on a Failed node:

# blockdev --reportRO    RA   SSZ   BSZ   StartSec            Size   Devicerw   256   512  4096          0 120009573531648   /dev/sdarw   256   512  4096          0 100007977943040   /dev/sdcrw   256   512  4096          0 220017543086080   /dev/dm-3

# blockdev --getsize /dev/drbd0blockdev: cannot open /dev/drbd0: Wrong medium type
Reported sizes on a Operational node:

# blockdev --reportRO    RA   SSZ   BSZ   StartSec            Size   Devicerw   256   512  4096          0 120009573531648   /dev/sdarw   256   512  4096          0 100007977943040   /dev/sdcrw   256   512  4096          0 220017543086080   /dev/dm-3rw   256   512  4096          0 220010828644352   /dev/drbd0
# blockdev --getsize /dev/drbd0429708649696
# vgdisplay
  --- Volume group ---  VG Name               drbdpool  System ID               Format                lvm2  Metadata Areas        2  Metadata Sequence No  2  VG Access             read/write  VG Status             resizable  MAX LV                0  Cur LV                1  Open LV               1  Max PV                0  Cur PV                2  Act PV                2  VG Size               200.10 TiB  PE Size               4.00 MiB  Total PE              52456270  Alloc PE / Size       52456270 / 200.10 TiB  Free  PE / Size       0 / 0      
# lvdisplay   --- Logical volume ---  LV Path                /dev/drbdpool/data  LV Name                data  VG Name                drbdpool  LV Write Access        read/write  LV Status              available  # open                 2  LV Size                200.10 TiB  Current LE             52456270  Segments               2  Allocation             inherit  Read ahead sectors     auto  - currently set to     256  Block device           253:3 

# dmesg | grep drbd[    1.863088] drbd: loading out-of-tree module taints kernel.[    1.865879] drbd: module verification failed: signature and/or required key missing - tainting kernel[    1.894498] drbd: initialized. Version: 8.4.11-1 (api:1/proto:86-101)[    1.894501] drbd: GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@, 2018-04-26 12:10:42[    1.894502] drbd: registered as block device major 147[   88.950747] drbd sg-master-drbd: Starting worker thread (from drbdsetup-84 [3242])[   88.951999] drbd sg-master-drbd: conn( StandAlone -> Unconnected ) [   88.952532] drbd sg-master-drbd: Starting receiver thread (from drbd_w_sg-maste [3244])[   88.952592] drbd sg-master-drbd: receiver (re)started[   88.952656] drbd sg-master-drbd: conn( Unconnected -> WFConnection ) [   89.453261] drbd sg-master-drbd: Handshake successful: Agreed network protocol version 101[   89.453271] drbd sg-master-drbd: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.[   89.453358] drbd sg-master-drbd: conn( WFConnection -> WFReportParams ) [   89.453373] drbd sg-master-drbd: Starting ack_recv thread (from drbd_r_sg-maste [3245])[   89.469010] block drbd0: max BIO size = 4096[   89.469023] block drbd0: size = 200 TB (214854324848 KB)[   89.469043] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) [49807.178096] drbd sg-master-drbd: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) [49807.178116] drbd sg-master-drbd: ack_receiver terminated[49807.178124] drbd sg-master-drbd: Terminating drbd_a_sg-maste[49807.192386] drbd sg-master-drbd: Connection closed[49807.192452] drbd sg-master-drbd: conn( Disconnecting -> StandAlone ) [49807.192463] drbd sg-master-drbd: receiver terminated[49807.192470] drbd sg-master-drbd: Terminating drbd_r_sg-maste[49807.229346] drbd sg-master-drbd: Terminating drbd_w_sg-maste[49847.525209] drbd sg-master-drbd: Starting worker thread (from drbdsetup-84 [23082])[49847.525490] drbd sg-master-drbd: conn( StandAlone -> Unconnected ) [49847.525542] drbd sg-master-drbd: Starting receiver thread (from drbd_w_sg-maste [23084])[49847.525624] drbd sg-master-drbd: receiver (re)started[49847.525687] drbd sg-master-drbd: conn( Unconnected -> WFConnection ) [49848.025725] drbd sg-master-drbd: Handshake successful: Agreed network protocol version 101[49848.025735] drbd sg-master-drbd: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.[49848.025964] drbd sg-master-drbd: conn( WFConnection -> WFReportParams ) [49848.025979] drbd sg-master-drbd: Starting ack_recv thread (from drbd_r_sg-maste [23085])[49848.036394] block drbd0: max BIO size = 4096[49848.036407] block drbd0: size = 200 TB (214854324848 KB)[49848.036427] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) 

//OE

-----Original Message-----
From: Robert Altnoeder <robert.altnoeder at linbit.com>
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Extent XXX beyond end of bitmap!
Date: Tue, 14 Aug 2018 13:03:40 +0200

The following information would be useful for debugging:- Internal or external meta data?- Any special activity log configuration, like a striped AL, differentAL stripe size, etc.?- Any manually configured number of AL extents?- Value of max-peers- Reported size of the DRBD device in sectors- Reported size of the backend device for DRBD in sectors- Ideally, a 'drbdadm dump-md' of the meta data of the affected devices
br,Robert
On 08/14/2018 10:02 AM, Yannis Milios wrote:Does this happen on both nodes? What’s the status of the backingdevice (lvm) ? Can you post the exact versions for both kernel moduleand utils? Any clue in the logs?
On Tue, 14 Aug 2018 at 06:57, Oleksiy Evin <o.evin at onefc.com<mailto:o.evin at onefc.com>> wrote:

    # drbdadm attach all    extent 19136522 beyond end of bitmap!    extent 19143798 beyond end of bitmap!    extent 19151565 beyond end of bitmap!
    ../shared/drbdmeta.c:2279:apply_al: ASSERT(bm_pos - bm_on_disk_pos    <= chunk - extents_size) failed.

_______________________________________________drbd-user mailing listdrbd-user at lists.linbit.comhttp://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20181002/5c8c4ad9/attachment.htm>


More information about the drbd-user mailing list