[DRBD-user] linstor issues
Adam Goryachev
mailinglists at websitemanagers.com.au
Tue Jun 23 04:53:23 CEST 2020
I've tried to follow the limited documentation on installing DRBD 9 and
linstor, and sort of managed to get things working. I have three nodes
(castle, san5 and san6). I've re-built the various ubuntu packages under
debian, and installed on debian buster on all three machines:
drbd-dkms_9.0.22-1ppa1~bionic1_all.deb
drbd-utils_9.13.1-1ppa1~bionic1_amd64.deb
linstor-controller_1.7.1-1ppa1~bionic1_all.deb
linstor-satellite_1.7.1-1ppa1~bionic1_all.deb
linstor-common_1.7.1-1ppa1~bionic1_all.deb
python-linstor_1.1.1-1ppa1~bionic1_all.deb
linstor-client_1.1.1-1ppa1~bionic1_all.deb
After adding the three nodes I had this output:
linstor node list
╭──────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞══════════════════════════════════════════════════════════╡
┊ castle ┊ SATELLITE ┊ <IP>.204:3366 (PLAIN) ┊ Online ┊
┊ san5 ┊ SATELLITE ┊ <IP>.205:3366 (PLAIN) ┊ Online ┊
┊ san6 ┊ SATELLITE ┊ <IP>.206:3366 (PLAIN) ┊ Online ┊
╰──────────────────────────────────────────────────────────╯
Then I added some storage pools:
linstor storage-pool list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊
TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ castle ┊ DISKLESS ┊ ┊
┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ san5 ┊ DISKLESS ┊ ┊
┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ san6 ┊ DISKLESS ┊ ┊
┊ ┊ False ┊ Ok ┊
┊ pool ┊ castle ┊ LVM ┊ vg_hdd ┊ 3.44 TiB
┊ 3.44 TiB ┊ False ┊ Ok ┊
┊ pool ┊ san5 ┊ LVM ┊ vg_hdd ┊ 4.36 TiB
┊ 4.36 TiB ┊ False ┊ Ok ┊
┊ pool ┊ san6 ┊ LVM ┊ vg_ssd ┊ 1.75 TiB
┊ 1.75 TiB ┊ False ┊ Ok ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Again, everything was looking pretty good.
So, I tried to create a resource, and then I got this:
linstor resource list
╭────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊
╞════════════════════════════════════════════════════════════════════════════╡
┊ testvm1 ┊ castle ┊ 7000 ┊ ┊ ┊ Unknown ┊
┊ testvm1 ┊ san5 ┊ 7000 ┊ ┊ ┊ Unknown ┊
┊ testvm1 ┊ san6 ┊ 7000 ┊ Unused ┊ Connecting(san5,castle) ┊
UpToDate ┊
╰────────────────────────────────────────────────────────────────────────────╯
There hasn't been any change in over 24 hours, so I'm guessing there is
something stuck/not working, but I don't seem to have many clues on what
it might be.
I've checked through the docs at:
https://www.linbit.com/drbd-user-guide/linstor-guide-1_0-en/ and found
these two commands in section 2.7 Checking the state of your cluster:
# linstor node list
# linstor storage-pool list --groupby Size
However, the second command produces a usage error (documentation bug
perhaps). Editing the command to something valid produces:
linstor storage-pool list --groupby Node
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊
TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ castle ┊ DISKLESS ┊ ┊
┊ ┊ False ┊ Ok ┊
┊ pool ┊ castle ┊ LVM ┊ vg_hdd ┊ 3.44 TiB
┊ 3.44 TiB ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ san5 ┊ DISKLESS ┊ ┊
┊ ┊ False ┊ Warning ┊
┊ pool ┊ san5 ┊ LVM ┊ vg_hdd ┊
┊ ┊ False ┊ Warning ┊
┊ DfltDisklessStorPool ┊ san6 ┊ DISKLESS ┊ ┊
┊ ┊ False ┊ Ok ┊
┊ pool ┊ san6 ┊ LVM ┊ vg_ssd ┊ 1.26 TiB
┊ 1.75 TiB ┊ False ┊ Ok ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
WARNING:
Description:
No active connection to satellite 'san5'
Details:
The controller is trying to (re-) establish a connection to the
satellite. The controller stored the changes and as soon the satellite
is connected, it will receive this update.
Note, after waiting approx 20hours, san5 was shutdown cleanly, so is
currently offline.
dmesg on san6 includes this:
[95078.272184] drbd testvm1: Starting worker thread (from drbdsetup [2398])
[95078.285272] drbd testvm1 castle: Starting sender thread (from
drbdsetup [2402])
[95078.290733] drbd testvm1 san5: Starting sender thread (from drbdsetup
[2406])
[95078.310399] drbd testvm1/0 drbd1000: meta-data IO uses: blk-bio
[95078.310500] drbd testvm1/0 drbd1000: rs_discard_granularity feature
disabled
[95078.310767] drbd testvm1/0 drbd1000: disk( Diskless -> Attaching )
[95078.310775] drbd testvm1/0 drbd1000: Maximum number of peer devices = 7
[95078.310864] drbd testvm1: Method to ensure write ordering: flush
[95078.310867] drbd testvm1/0 drbd1000: Adjusting my ra_pages to backing
device's (32 -> 1024)
[95078.310870] drbd testvm1/0 drbd1000: drbd_bm_resize called with
capacity == 1048581248
[95078.418753] drbd testvm1/0 drbd1000: resync bitmap: bits=131072656
words=14336077 pages=28001
[95078.418757] drbd testvm1/0 drbd1000: size = 500 GB (524290624 KB)
[95078.593417] drbd testvm1/0 drbd1000: recounting of set bits took
additional 64ms
[95078.593429] drbd testvm1/0 drbd1000: disk( Attaching -> Inconsistent
) quorum( no -> yes )
[95078.593431] drbd testvm1/0 drbd1000: attached to current UUID:
0000000000000004
[95078.595412] drbd testvm1 castle: conn( StandAlone -> Unconnected )
[95078.596649] drbd testvm1 san5: conn( StandAlone -> Unconnected )
[95078.599430] drbd testvm1 castle: Starting receiver thread (from
drbd_w_testvm1 [2399])
[95078.599742] drbd testvm1 san5: Starting receiver thread (from
drbd_w_testvm1 [2399])
[95078.599813] drbd testvm1 castle: conn( Unconnected -> Connecting )
[95078.604454] drbd testvm1 san5: conn( Unconnected -> Connecting )
[95079.113391] drbd testvm1/0 drbd1000: rs_discard_granularity feature
disabled
[95079.146175] drbd testvm1: Preparing cluster-wide state change
1272763172 (2->-1 7683/4609)
[95079.146178] drbd testvm1: Committing cluster-wide state change
1272763172 (0ms)
[95079.146184] drbd testvm1: role( Secondary -> Primary )
[95079.146186] drbd testvm1/0 drbd1000: disk( Inconsistent -> UpToDate )
[95079.146256] drbd testvm1/0 drbd1000: size = 500 GB (524290624 KB)
[95079.152264] drbd testvm1: Forced to consider local data as UpToDate!
[95079.156608] drbd testvm1/0 drbd1000: new current UUID:
60E1FC2F9926E84B weak: FFFFFFFFFFFFFFFB
[95079.159415] drbd testvm1: role( Primary -> Secondary )
----- a few weeks later...
I wrote the above intending to have another go at this later, and so now
I have san5 back online, and have rebooted both castle and san6, now my
status on all three is:
linstor n l
╭───────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞═══════════════════════════════════════════════════════════╡
┊ castle ┊ SATELLITE ┊ 192.168.5.204:3366 (PLAIN) ┊ Unknown ┊
┊ san5 ┊ SATELLITE ┊ 192.168.5.205:3366 (PLAIN) ┊ Unknown ┊
┊ san6 ┊ SATELLITE ┊ 192.168.5.206:3366 (PLAIN) ┊ Unknown ┊
╰───────────────────────────────────────────────────────────╯
Is there any other documentation on what to do when things go wrong? A
checklist to find where the problem might be? With the old DRBD 8.4
/proc/drbd or dmesg seemed to be the two main sources of information,
but now I seem quite out of my depth. Any clues or suggestions on things
to check, additional information to provide/etc would be greatly
appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20200623/d956be26/attachment-0001.htm>
More information about the drbd-user
mailing list