[DRBD-user] 125TB volume working, 130TB not working

Lars Ellenberg lars.ellenberg at linbit.com
Tue Jun 19 10:18:37 CEST 2018


On Mon, Jun 18, 2018 at 01:15:48PM +0000, Svein erik AASEN wrote:
> Hi,
> I see some problems using very large volumes. Initially I tried a 480TB volume and it failed at "drbdadm up ...".
> Then I reduced the size till I got it working at 125TB. At 130TB it fails. Maybe there is a limit at 128 TB?
> 
> My system is running Opensuse Leap 15. I have tested both on DRBD 9.0.13 and 9.0.14.
> 
> My configuration commands are shown below, as well as the errors listed in the message log.
> The config file contains the same as the one shown in the top of page 22 in the Users Guide.
> (Basic two node active/passive setup)
> 
> Regards,
> Svein Erik Aasen
> 
> 
> fdisk -l
> Disk /dev/sdb: 480.2 TiB, 527982477180928 bytes, 1031215775744 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 262144 bytes / 262144 bytes
> Disklabel type: gpt
> Disk identifier: B2F29E9D-F576-494F-9BA7-DA2D28A0A974
> 
> Device     Start           End       Sectors   Size Type
> /dev/sdb1   2048 1031215775710 1031215773663 480.2T Linux filesystem
> 
> 
> drbdadm -v create-md drbd1
> drbdmeta 1 v09 /dev/sdb1 internal create-md 1
> You want me to create a v09 style flexible-size internal meta data block.
> There appears to be a v09 flexible-size internal meta data block
> already in place on /dev/sdb1 at byte offset 527982476107776
> 
> Do you really want to overwrite the existing meta-data?
> [need to type 'yes' to confirm] yes
> 
> md_offset 527982476107776
> al_offset 527982476075008
> bm_offset 527966363328512
> 
> Found some data
> 
>  ==> This might destroy existing data! <==
> 
> Do you want to proceed?
> [need to type 'yes' to confirm] yes
> 
> initializing activity log
> initializing bitmap (15735104 KB) to all zero
> Writing meta data...
> New drbd meta data block successfully created.
> drbdmeta 1 v09 /dev/sdb1 internal write-dev-uuid C21427160033D05D
> 
> 
> 
> drbdadm -v up drbd1
> drbdsetup new-resource drbd1 1 --quorum=off
> drbdsetup new-minor drbd1 1 0
> drbdsetup new-peer drbd1 2 --_name=wave --protocol=C
> drbdsetup new-path drbd1 2 ipv4:10.0.20.1:7789 ipv4:10.0.20.2:7789
> drbdmeta 1 v09 /dev/sdb1 internal apply-al
> drbdsetup attach 1 /dev/sdb1 /dev/sdb1 internal
> 1: Failure: (118) IO error(s) occurred during initial access to meta-data.
> 
> Command 'drbdsetup attach 1 /dev/sdb1 /dev/sdb1 internal' terminated with exit code 10
> 
> 
> 
> dmesg -T
> [ma. juni 18 14:43:54 2018] drbd drbd1: Starting worker thread (from drbdsetup [8116])
> [ma. juni 18 14:43:54 2018] drbd drbd1 wave: Starting sender thread (from drbdsetup [8120])
> [ma. juni 18 14:43:54 2018] drbd drbd1/0 drbd1: disk( Diskless -> Attaching )
> [ma. juni 18 14:43:54 2018] drbd drbd1/0 drbd1: Maximum number of peer devices = 1
> [ma. juni 18 14:43:54 2018] drbd drbd1: Method to ensure write ordering: flush
> [ma. juni 18 14:43:54 2018] drbd drbd1/0 drbd1: drbd_bm_resize called with capacity == 1031184303376
> [ma. juni 18 14:44:02 2018] drbd drbd1/0 drbd1: resync bitmap: bits=128898037922 words=2014031843 pages=3933656
> [ma. juni 18 14:44:02 2018] drbd drbd1/0 drbd1: size = 480 TB (515592151688 KB)
> [ma. juni 18 14:44:10 2018] drbd drbd1/0 drbd1: IO ERROR 10 on bitmap page idx 788057
> [ma. juni 18 14:44:21 2018] drbd drbd1/0 drbd1: IO ERROR 10 on bitmap page idx 1836633
> [ma. juni 18 14:44:32 2018] drbd drbd1/0 drbd1: IO ERROR 10 on bitmap page idx 2885209

I notice an interesting pattern here,
if you print those out as hex:
0x0c0659
0x1c0659
0x2c0659

If you try again, do those numbers change?
If they change, do they still show such a pattern in hex digits?

> [ma. juni 18 14:44:43 2018] drbd drbd1/0 drbd1: we had at least one MD IO ERROR during bitmap IO
> [ma. juni 18 14:44:47 2018] drbd drbd1/0 drbd1: recounting of set bits took additional 3912ms
> [ma. juni 18 14:44:47 2018] drbd drbd1/0 drbd1: disk( Attaching -> Diskless )


Also, I'm trying to understand use cases "out there".
What is the reason that you want a single "huge" DRBD volume,
why not several smaller ones?

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed


More information about the drbd-user mailing list