<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>I've tried to follow the limited documentation on installing DRBD
9 and linstor, and sort of managed to get things working. I have
three nodes (castle, san5 and san6). I've re-built the various
ubuntu packages under debian, and installed on debian buster on
all three machines:</p>
<p>drbd-dkms_9.0.22-1ppa1~bionic1_all.deb<br>
drbd-utils_9.13.1-1ppa1~bionic1_amd64.deb<br>
linstor-controller_1.7.1-1ppa1~bionic1_all.deb<br>
linstor-satellite_1.7.1-1ppa1~bionic1_all.deb<br>
linstor-common_1.7.1-1ppa1~bionic1_all.deb<br>
python-linstor_1.1.1-1ppa1~bionic1_all.deb<br>
linstor-client_1.1.1-1ppa1~bionic1_all.deb<br>
</p>
<p>After adding the three nodes I had this output:<br>
linstor node list<br>
╭──────────────────────────────────────────────────────────╮<br>
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊<br>
╞══════════════════════════════════════════════════════════╡<br>
┊ castle ┊ SATELLITE ┊ <IP>.204:3366 (PLAIN) ┊ Online ┊<br>
┊ san5 ┊ SATELLITE ┊ <IP>.205:3366 (PLAIN) ┊ Online ┊<br>
┊ san6 ┊ SATELLITE ┊ <IP>.206:3366 (PLAIN) ┊ Online ┊<br>
╰──────────────────────────────────────────────────────────╯<br>
</p>
<p>Then I added some storage pools:<br>
linstor storage-pool list<br>
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────╮<br>
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊
FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊<br>
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════╡<br>
┊ DfltDisklessStorPool ┊ castle ┊ DISKLESS ┊
┊ ┊ ┊ False ┊ Ok ┊<br>
┊ DfltDisklessStorPool ┊ san5 ┊ DISKLESS ┊
┊ ┊ ┊ False ┊ Ok ┊<br>
┊ DfltDisklessStorPool ┊ san6 ┊ DISKLESS ┊
┊ ┊ ┊ False ┊ Ok ┊<br>
┊ pool ┊ castle ┊ LVM ┊ vg_hdd ┊ 3.44
TiB ┊ 3.44 TiB ┊ False ┊ Ok ┊<br>
┊ pool ┊ san5 ┊ LVM ┊ vg_hdd ┊ 4.36
TiB ┊ 4.36 TiB ┊ False ┊ Ok ┊<br>
┊ pool ┊ san6 ┊ LVM ┊ vg_ssd ┊ 1.75
TiB ┊ 1.75 TiB ┊ False ┊ Ok ┊<br>
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────╯<br>
</p>
<p>Again, everything was looking pretty good.</p>
<p>So, I tried to create a resource, and then I got this:</p>
<p>linstor resource list<br>
╭────────────────────────────────────────────────────────────────────────────╮<br>
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns
┊ State ┊<br>
╞════════════════════════════════════════════════════════════════════════════╡<br>
┊ testvm1 ┊ castle ┊ 7000 ┊ ┊
┊ Unknown ┊<br>
┊ testvm1 ┊ san5 ┊ 7000 ┊ ┊
┊ Unknown ┊<br>
┊ testvm1 ┊ san6 ┊ 7000 ┊ Unused ┊ Connecting(san5,castle)
┊ UpToDate ┊<br>
╰────────────────────────────────────────────────────────────────────────────╯<br>
</p>
<p>There hasn't been any change in over 24 hours, so I'm guessing
there is something stuck/not working, but I don't seem to have
many clues on what it might be.</p>
<p>I've checked through the docs at: <a
class="moz-txt-link-freetext"
href="https://www.linbit.com/drbd-user-guide/linstor-guide-1_0-en/">https://www.linbit.com/drbd-user-guide/linstor-guide-1_0-en/</a>
and found these two commands in section 2.7 Checking the state of
your cluster:</p>
<div class="listingblock">
<div class="content">
<pre># linstor node list
# linstor storage-pool list --groupby Size</pre>
</div>
</div>
<div class="listingblock">However, the second command produces a
usage error (documentation bug perhaps). Editing the command to
something valid produces:</div>
<div class="listingblock">linstor storage-pool list --groupby Node<br>
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮<br>
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊
FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊<br>
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════╡<br>
┊ DfltDisklessStorPool ┊ castle ┊ DISKLESS ┊
┊ ┊ ┊ False ┊ Ok ┊<br>
┊ pool ┊ castle ┊ LVM ┊ vg_hdd ┊ 3.44
TiB ┊ 3.44 TiB ┊ False ┊ Ok ┊<br>
┊ DfltDisklessStorPool ┊ san5 ┊ DISKLESS ┊
┊ ┊ ┊ False ┊ Warning ┊<br>
┊ pool ┊ san5 ┊ LVM ┊ vg_hdd
┊ ┊ ┊ False ┊ Warning ┊<br>
┊ DfltDisklessStorPool ┊ san6 ┊ DISKLESS ┊
┊ ┊ ┊ False ┊ Ok ┊<br>
┊ pool ┊ san6 ┊ LVM ┊ vg_ssd ┊ 1.26
TiB ┊ 1.75 TiB ┊ False ┊ Ok ┊<br>
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯<br>
WARNING:<br>
Description:<br>
No active connection to satellite 'san5'<br>
Details:<br>
The controller is trying to (re-) establish a connection to
the satellite. The controller stored the changes and as soon the
satellite is connected, it will receive this update.</div>
<div class="listingblock"><br>
</div>
<div class="listingblock">Note, after waiting approx 20hours, san5
was shutdown cleanly, so is currently offline.</div>
<div class="listingblock"><br>
</div>
<div class="listingblock">dmesg on san6 includes this:</div>
<div class="listingblock">[95078.272184] drbd testvm1: Starting
worker thread (from drbdsetup [2398])<br>
[95078.285272] drbd testvm1 castle: Starting sender thread (from
drbdsetup [2402])<br>
[95078.290733] drbd testvm1 san5: Starting sender thread (from
drbdsetup [2406])<br>
[95078.310399] drbd testvm1/0 drbd1000: meta-data IO uses: blk-bio<br>
[95078.310500] drbd testvm1/0 drbd1000: rs_discard_granularity
feature disabled<br>
[95078.310767] drbd testvm1/0 drbd1000: disk( Diskless ->
Attaching )<br>
[95078.310775] drbd testvm1/0 drbd1000: Maximum number of peer
devices = 7<br>
[95078.310864] drbd testvm1: Method to ensure write ordering:
flush<br>
[95078.310867] drbd testvm1/0 drbd1000: Adjusting my ra_pages to
backing device's (32 -> 1024)<br>
[95078.310870] drbd testvm1/0 drbd1000: drbd_bm_resize called with
capacity == 1048581248<br>
[95078.418753] drbd testvm1/0 drbd1000: resync bitmap:
bits=131072656 words=14336077 pages=28001<br>
[95078.418757] drbd testvm1/0 drbd1000: size = 500 GB (524290624
KB)<br>
[95078.593417] drbd testvm1/0 drbd1000: recounting of set bits
took additional 64ms<br>
[95078.593429] drbd testvm1/0 drbd1000: disk( Attaching ->
Inconsistent ) quorum( no -> yes )<br>
[95078.593431] drbd testvm1/0 drbd1000: attached to current UUID:
0000000000000004<br>
[95078.595412] drbd testvm1 castle: conn( StandAlone ->
Unconnected )<br>
[95078.596649] drbd testvm1 san5: conn( StandAlone ->
Unconnected )<br>
[95078.599430] drbd testvm1 castle: Starting receiver thread (from
drbd_w_testvm1 [2399])<br>
[95078.599742] drbd testvm1 san5: Starting receiver thread (from
drbd_w_testvm1 [2399])<br>
[95078.599813] drbd testvm1 castle: conn( Unconnected ->
Connecting )<br>
[95078.604454] drbd testvm1 san5: conn( Unconnected ->
Connecting )<br>
[95079.113391] drbd testvm1/0 drbd1000: rs_discard_granularity
feature disabled<br>
[95079.146175] drbd testvm1: Preparing cluster-wide state change
1272763172 (2->-1 7683/4609)<br>
[95079.146178] drbd testvm1: Committing cluster-wide state change
1272763172 (0ms)<br>
[95079.146184] drbd testvm1: role( Secondary -> Primary )<br>
[95079.146186] drbd testvm1/0 drbd1000: disk( Inconsistent ->
UpToDate )<br>
[95079.146256] drbd testvm1/0 drbd1000: size = 500 GB (524290624
KB)<br>
[95079.152264] drbd testvm1: Forced to consider local data as
UpToDate!<br>
[95079.156608] drbd testvm1/0 drbd1000: new current UUID:
60E1FC2F9926E84B weak: FFFFFFFFFFFFFFFB<br>
[95079.159415] drbd testvm1: role( Primary -> Secondary )<br>
<br>
</div>
<div class="listingblock"><br>
</div>
<div class="listingblock">----- a few weeks later...<br>
</div>
<div class="listingblock"><br>
</div>
<div class="listingblock">I wrote the above intending to have
another go at this later, and so now I have san5 back online, and
have rebooted both castle and san6, now my status on all three is:</div>
<div class="listingblock">linstor n l<br>
╭───────────────────────────────────────────────────────────╮<br>
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊<br>
╞═══════════════════════════════════════════════════════════╡<br>
┊ castle ┊ SATELLITE ┊ 192.168.5.204:3366 (PLAIN) ┊ Unknown ┊<br>
┊ san5 ┊ SATELLITE ┊ 192.168.5.205:3366 (PLAIN) ┊ Unknown ┊<br>
┊ san6 ┊ SATELLITE ┊ 192.168.5.206:3366 (PLAIN) ┊ Unknown ┊<br>
╰───────────────────────────────────────────────────────────╯<br>
</div>
<div class="listingblock">
<div class="listingblock"><br>
</div>
<div class="listingblock">Is there any other documentation on what
to do when things go wrong? A checklist to find where the
problem might be? With the old DRBD 8.4 /proc/drbd or dmesg
seemed to be the two main sources of information, but now I seem
quite out of my depth. Any clues or suggestions on things to
check, additional information to provide/etc would be greatly
appreciated.</div>
</div>
<div class="listingblock"><br>
</div>
</body>
</html>