[DRBD-user] First Linstor bug encountered

Tue Aug 21 19:06:47 CEST 2018

Le 21/08/2018 à 18:39, Robert Altnoeder a écrit :
> On 08/21/2018 06:23 PM, Julien Escario wrote:
>> Hello,
>> Just hit a bug after multiple creation/deletion of resources on my two nodes
>> cluster.
>>
>> Syslog reports :
>>
>> Aug 21 17:31:28 dedie83 Satellite[15917]: 17:31:28.828 [MainWorkerPool_0016]
>> ERROR LINSTOR/Satellite - Problem of type 'java.lang.NullPointerException'
>> logged to report number 5B770066-000000
>> Aug 21 17:31:28 dedie83 Satellite[15917]: 17:31:28.833 [MainWorkerPool_0016]
>> ERROR LINSTOR/Satellite - Access to deleted resource [Report number
>> 5B770066-000001]
> 
> In that case, we'd certainly be interested in getting the
> /opt/linstor-server/logs/ErrorReport-5B770066-000001.log file.
> Looks like a satellite attempted to work with some data of a resource
> that it had declared deleted before.

Right but it would be useful to understand why this happened.

Here is the REAL error log I think (some obscure Java error ;-):

https://framabin.org/p/?645b441b2abeefd1#disCSMbiaa1NAiFwTLr4iXZ414h2bjMlfmGA/MdMP3k=

And the one you asked for (Access to deleted resource) :

https://framabin.org/p/?772e59e80450a5fa#nl4PG6/tkx2yjTUDXA9AsUxls18TkAgCr8Ee76Y5ja8=

> For recovery, disconnecting/reconnecting the Satellite (or just
> restarting it) should suffice. It should normally also retry the
> resource deletion afterwards.

Right, restarting linstor-satellite reset everything in the right state.

I had to issue :
systemctl restart linstor-satellite

but on BOTH nodes.

> I noticed the resource is in the "Primary" role. It might still be in
> use, and in that case, LINSTOR can not successfully delete the resource,
> because the layers below LINSTOR do not support that.

Resource was in primary on the surviving node. I tried to delete it on secondary
node.
Tried to delete a primary resource failed with 'Resource is mounted/in use.'.
That's the correct behavior I think ;-)

Even if there as been a little glitch in the force, cluster is now back in a
correct state for me and without rebooting, that's far better than drbdmanage's
dbus crashes ;-)

Best regards,
Julien