[DRBD-user] linstor resource inconsistent

Gábor Hernádi gabor.hernadi at linbit.com
Wed Oct 30 07:48:53 CET 2019


Hi,

... . Satellite and Controller are
> deployed using docker and are version 1.2.0. The Satellites are started
> using the "--net=host --privileged" options for docker.
>

That sounds like the first problem here. Multiple docker containers still
share the same kernel. As DRBD is a kernel module, all of your docker
containers will share basically the same DRBD, but will try to configure it
differently.


> I was able to successfully add the satellite nodes to the controller and
> create a resource group, volume group and an initial resource with
> place-count=1.
>

Sure, with only 1 container using DRBD all works fine.


> I then deleted the resource, resource definition and resource group and
> re-created the group but this time with place-count=2.
>

This is where the trouble begins :)


> Finally I created a new volume and this the the command used "linstor
> resource-group spawn-resources group1 res1 20G" just hung. After a while
> I hit Ctrl+C and looked at the resource list which looked like this:
>
> ╭───────────────────────────────────────────────────────────╮
> ┊ ResourceName ┊ Node        ┊ Port ┊ Usage  ┊        State ┊
> ╞═══════════════════════════════════════════════════════════╡
> ┊ res1         ┊ storagesat1 ┊ 7000 ┊ Unused ┊     UpToDate ┊
> ┊ res1         ┊ storagesat2 ┊ 7000 ┊ Unused ┊ Inconsistent ┊
> ╰───────────────────────────────────────────────────────────╯
>
> In the satellite logs I see this on all three satellite nodes (but with
> different report numbers in each case):
> 16:24:50.767 [DeviceManager] ERROR LINSTOR/Satellite - SYSTEM -
> com.linbit.linstor.storage.StorageException: Failed to find major:minor
> of device /dev/drbd1000 [Report number 5DB85E9C-8A7C3-000001]
>

I admit, we might want to look into this, even in such a case the command
should not hang or freeze. We should report an error all the way back to
the client.
However, if you'd look into the ErrorReport (using "linstor err show
5DB85E9C-8A7C3-000001") you'd see what the cause of the message "Failed to
find major:minor of device ..." was. This error message is triggered by a
call to "stat -L -c %t:%T $devicePath". Although that alone might not be
very useful, I am pretty sure that the standard out and standard error
messages of that external command (which are also included in the
ErrorReport) will surely point you in the right direction.
However, it is also useful to know that the ErrorReport ids are
$sessionNumber-$nodeNameHash-$incrementalNumber. Which means, you might
also want to look into the previous ErrorReport 5DB85E9C-8A7C3-000000. This
might be an ErrorReport about drbd failing to adjust the /dev/drbd1000
device, because of.. well.. the shared kernel...


> The device node exists on all three nodes and looks identical:
> # ls -l /dev/drbd1000
> brw-rw---- 1 root disk 147, 1000 Oct 29 17:24 /dev/drbd1000
>

Still the same issue. If you repeat your scenario with only 1 satellite,
where everything worked well, you should still see in ALL docker containers
the /dev/drbd1000 (as well as on the host itself).

-- 
Best regards,
Gabor Hernadi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20191030/cd412804/attachment.htm>


More information about the drbd-user mailing list