[DRBD-user] linstor Error after adding resource to node

Adam Goryachev mailinglists at websitemanagers.com.au
Wed Sep 30 13:41:32 CEST 2020


On 30/9/20 16:49, Gábor Hernádi wrote:
> Hi,
>
> I tried to recreate this issue, but without success.
>
> 4 Node setup, all LVM
> First create a resource with --auto-place 3,
> Create 9 other resources with --auto-place 4
> Create the first resource on the 4th (missing) node
> Check "linstor volume list"
>
> That means, there has to be something else in your setup.
> What else did you do? I see that your "first" resource "windows-wm" 
> was more like the second resource, as it got the minor-number 1001, 
> instead of 1000. That minor-number 1000 was later reused by "testvm1". 
> However, was something broken with the "original" resource using 
> minor-number 1000?
>
Unfortunately, yes, a whole bunch of things have been done on the first 
three nodes. I've been slowly messing around to try and get everything 
working over the last few months. There was another "testvm3" created 
before, which I deleted, and then started again with the further testing....


> Error report 5F733CD9-00000-000004 is a NullPointerException, but this 
> is most likely just a side-effect of the original issue.
>
> > Since it looks relevant, error reports 1, 2 and 3 are all similar 
> for nodes castle, san5 and san6
>
> What about error report 0? Not relevant for this issue?
>
Ooops, I just didn't think that there was a 0 report... I've included it 
here now:

ERROR REPORT 5F733CD9-00000-000000

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-09-23T10:27:49+00:00
Error time:                         2020-09-30 00:12:19
Node:                               castle.websitemanagers.com.au
Peer:                               RestClient(192.168.5.207; 
'PythonLinstor/1.4.0 (API1.0.4)')

============================================================

Reported error:
===============

Description:
     Dependency not found

Category:                           LinStorException
Class name:                         LinStorException
Class canonical name: com.linbit.linstor.LinStorException
Generated at:                       Method 'checkStorPoolLoaded', Source 
file 'CtrlStorPoolResolveHelper.java', Line #225

Error message:                      Dependency not found

Error context:
     The storage pool 'DfltStorPool' for resource 'windows-wm' for 
volume number '0' is not deployed on node 'san7'.

Call backtrace:

     Method                                   Native Class:Line number
     checkStorPoolLoaded                      N 
com.linbit.linstor.CtrlStorPoolResolveHelper:225
     resolveStorPool                          N 
com.linbit.linstor.CtrlStorPoolResolveHelper:149
     resolveStorPool                          N 
com.linbit.linstor.CtrlStorPoolResolveHelper:65
     createVolumeResolvingStorPool            N 
com.linbit.linstor.core.apicallhandler.controller.CtrlVlmCrtApiHelper:72
     createResourceDb                         N 
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiHelper:396
     createResourceInTransaction              N 
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:171
     lambda$createResource$2                  N 
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:143
     doInScope                                N 
com.linbit.linstor.core.apicallhandler.ScopeRunner:147
     lambda$fluxInScope$0                     N 
com.linbit.linstor.core.apicallhandler.ScopeRunner:75
     call                                     N 
reactor.core.publisher.MonoCallable:91
     trySubscribeScalarMap                    N 
reactor.core.publisher.FluxFlatMap:126
     subscribeOrReturn                        N 
reactor.core.publisher.MonoFlatMapMany:49
     subscribe                                N 
reactor.core.publisher.Flux:8311
     onNext                                   N 
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
     request                                  N 
reactor.core.publisher.Operators$ScalarSubscription:2317
     onSubscribe                              N 
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
     subscribe                                N 
reactor.core.publisher.MonoCurrentContext:35
     subscribe                                N 
reactor.core.publisher.Flux:8325
     onNext                                   N 
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
     request                                  N 
reactor.core.publisher.Operators$ScalarSubscription:2317
     onSubscribe                              N 
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
     subscribe                                N 
reactor.core.publisher.MonoCurrentContext:35
     subscribe                                N 
reactor.core.publisher.Flux:8325
     onNext                                   N 
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
     onNext                                   N 
reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121
     complete                                 N 
reactor.core.publisher.Operators$MonoSubscriber:1755
     onComplete                               N 
reactor.core.publisher.MonoCollect$CollectSubscriber:152
     onComplete                               N 
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395
     onComplete                               N 
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
     checkTerminated                          N 
reactor.core.publisher.FluxFlatMap$FlatMapMain:838
     drainLoop                                N 
reactor.core.publisher.FluxFlatMap$FlatMapMain:600
     drain                                    N 
reactor.core.publisher.FluxFlatMap$FlatMapMain:580
     onComplete                               N 
reactor.core.publisher.FluxFlatMap$FlatMapMain:457
     checkTerminated                          N 
reactor.core.publisher.FluxFlatMap$FlatMapMain:838
     drainLoop                                N 
reactor.core.publisher.FluxFlatMap$FlatMapMain:600
     innerComplete                            N 
reactor.core.publisher.FluxFlatMap$FlatMapMain:909
     onComplete                               N 
reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
     onComplete                               N 
reactor.core.publisher.FluxMap$MapSubscriber:136
     onComplete                               N 
reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989
     onComplete                               N 
reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
     complete                                 N 
reactor.core.publisher.FluxCreate$BaseSink:438
     drain                                    N 
reactor.core.publisher.FluxCreate$BufferAsyncSink:784
     complete                                 N 
reactor.core.publisher.FluxCreate$BufferAsyncSink:732
     drainLoop                                N 
reactor.core.publisher.FluxCreate$SerializedSink:239
     drain                                    N 
reactor.core.publisher.FluxCreate$SerializedSink:205
     complete                                 N 
reactor.core.publisher.FluxCreate$SerializedSink:196
     apiCallComplete                          N 
com.linbit.linstor.netcom.TcpConnectorPeer:455
     handleComplete                           N 
com.linbit.linstor.proto.CommonMessageProcessor:363
     handleDataMessage                        N 
com.linbit.linstor.proto.CommonMessageProcessor:287
     doProcessInOrderMessage                  N 
com.linbit.linstor.proto.CommonMessageProcessor:235
     lambda$doProcessMessage$3                N 
com.linbit.linstor.proto.CommonMessageProcessor:220
     subscribe                                N 
reactor.core.publisher.FluxDefer:46
     subscribe                                N 
reactor.core.publisher.Flux:8325
     onNext                                   N 
reactor.core.publisher.FluxFlatMap$FlatMapMain:418
     drainAsync                               N 
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
     drain                                    N 
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
     onNext                                   N 
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
     drainFused                               N 
reactor.core.publisher.UnicastProcessor:286
     drain                                    N 
reactor.core.publisher.UnicastProcessor:322
     onNext                                   N 
reactor.core.publisher.UnicastProcessor:401
     next                                     N 
reactor.core.publisher.FluxCreate$IgnoreSink:618
     next                                     N 
reactor.core.publisher.FluxCreate$SerializedSink:153
     processInOrder                           N 
com.linbit.linstor.netcom.TcpConnectorPeer:373
     doProcessMessage                         N 
com.linbit.linstor.proto.CommonMessageProcessor:218
     lambda$processMessage$2                  N 
com.linbit.linstor.proto.CommonMessageProcessor:164
     onNext                                   N 
reactor.core.publisher.FluxPeek$PeekSubscriber:177
     runAsync                                 N 
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
     run                                      N 
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
     call                                     N 
reactor.core.scheduler.WorkerTask:84
     call                                     N 
reactor.core.scheduler.WorkerTask:37
     run                                      N 
java.util.concurrent.FutureTask:264
     run                                      N 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
     runWorker                                N 
java.util.concurrent.ThreadPoolExecutor:1128
     run                                      N 
java.util.concurrent.ThreadPoolExecutor$Worker:628
     run                                      N java.lang.Thread:834


END OF ERROR REPORT.

> > 1) Why did I end up in this state? I assume something was configured 
> on castle/san5/san6 but not on san7.
>
> Not sure... If something would be broken on san7, you should also have 
> gotten an error report from a satellite. The ones you showed here are 
> all created by the controller (error-ids XXX-00000-YYY are always 
> controller-errors, satellite errors would also have some other 
> "random-looking" number instead of the -00000- part)
>
> > 2) How can I fix it?
>
> If I cannot recreate it, there is not much I can do. You could of 
> course try restarting the controller, that will reload the data from 
> the database, which might fix things... I would be still curious what 
> caused all of this...


Sure, will see if I can work it out. From the 0 error, it looks like I 
created some configuration in a name DfltStorPool, and probably this was 
not replicated to san7 (because san7 didn't exist back then). I'm not 
sure whether I would expect this to automatically be copied to the node 
if/when it is required, or if I should get an error saying the resource 
can't be deployed due to a missing dependency, but I suspect it 
shouldn't crash the way it is at the moment...

OK, so I did restart the controller, and now linstor volume list returns 
this:

╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource   ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  
Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ castle ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  
49.16 MiB ┊ Unused ┊ UpToDate ┊
┊ san5   ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  
49.16 MiB ┊ Unused ┊ UpToDate ┊
┊ san6   ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  
49.16 MiB ┊ Unused ┊ UpToDate ┊
┊ san7   ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  
49.16 MiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

So, it would appear that it did actually deploy windows-wm to san7, and 
looks like it's all working again. Though I'm still rather unsure about 
my process, that with all my testing I might end up with an unstable 
system due to old config/testing bits left over.

To completely "reset" linstor, I've so far found the following configs:

/etc/linstor
/etc/drbd.d
/var/lib/linstor
/var/lib/linstor.d
/var/log/linstor

Plus I assume whatever storage spaces have been configured. Is there 
anything else that should be wiped to ensure I am starting with a clean 
slate? I'd rather not format the whole system and re-install ...

Thanks,
Adam




More information about the drbd-user mailing list