[DRBD-user] linstor Error after adding resource to node
Adam Goryachev
mailinglists at websitemanagers.com.au
Wed Sep 30 13:41:32 CEST 2020
On 30/9/20 16:49, Gábor Hernádi wrote:
> Hi,
>
> I tried to recreate this issue, but without success.
>
> 4 Node setup, all LVM
> First create a resource with --auto-place 3,
> Create 9 other resources with --auto-place 4
> Create the first resource on the 4th (missing) node
> Check "linstor volume list"
>
> That means, there has to be something else in your setup.
> What else did you do? I see that your "first" resource "windows-wm"
> was more like the second resource, as it got the minor-number 1001,
> instead of 1000. That minor-number 1000 was later reused by "testvm1".
> However, was something broken with the "original" resource using
> minor-number 1000?
>
Unfortunately, yes, a whole bunch of things have been done on the first
three nodes. I've been slowly messing around to try and get everything
working over the last few months. There was another "testvm3" created
before, which I deleted, and then started again with the further testing....
> Error report 5F733CD9-00000-000004 is a NullPointerException, but this
> is most likely just a side-effect of the original issue.
>
> > Since it looks relevant, error reports 1, 2 and 3 are all similar
> for nodes castle, san5 and san6
>
> What about error report 0? Not relevant for this issue?
>
Ooops, I just didn't think that there was a 0 report... I've included it
here now:
ERROR REPORT 5F733CD9-00000-000000
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time: 2020-09-23T10:27:49+00:00
Error time: 2020-09-30 00:12:19
Node: castle.websitemanagers.com.au
Peer: RestClient(192.168.5.207;
'PythonLinstor/1.4.0 (API1.0.4)')
============================================================
Reported error:
===============
Description:
Dependency not found
Category: LinStorException
Class name: LinStorException
Class canonical name: com.linbit.linstor.LinStorException
Generated at: Method 'checkStorPoolLoaded', Source
file 'CtrlStorPoolResolveHelper.java', Line #225
Error message: Dependency not found
Error context:
The storage pool 'DfltStorPool' for resource 'windows-wm' for
volume number '0' is not deployed on node 'san7'.
Call backtrace:
Method Native Class:Line number
checkStorPoolLoaded N
com.linbit.linstor.CtrlStorPoolResolveHelper:225
resolveStorPool N
com.linbit.linstor.CtrlStorPoolResolveHelper:149
resolveStorPool N
com.linbit.linstor.CtrlStorPoolResolveHelper:65
createVolumeResolvingStorPool N
com.linbit.linstor.core.apicallhandler.controller.CtrlVlmCrtApiHelper:72
createResourceDb N
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiHelper:396
createResourceInTransaction N
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:171
lambda$createResource$2 N
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:143
doInScope N
com.linbit.linstor.core.apicallhandler.ScopeRunner:147
lambda$fluxInScope$0 N
com.linbit.linstor.core.apicallhandler.ScopeRunner:75
call N
reactor.core.publisher.MonoCallable:91
trySubscribeScalarMap N
reactor.core.publisher.FluxFlatMap:126
subscribeOrReturn N
reactor.core.publisher.MonoFlatMapMany:49
subscribe N
reactor.core.publisher.Flux:8311
onNext N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N
reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribe N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N
reactor.core.publisher.MonoCurrentContext:35
subscribe N
reactor.core.publisher.Flux:8325
onNext N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N
reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribe N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N
reactor.core.publisher.MonoCurrentContext:35
subscribe N
reactor.core.publisher.Flux:8325
onNext N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
onNext N
reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121
complete N
reactor.core.publisher.Operators$MonoSubscriber:1755
onComplete N
reactor.core.publisher.MonoCollect$CollectSubscriber:152
onComplete N
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395
onComplete N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
checkTerminated N
reactor.core.publisher.FluxFlatMap$FlatMapMain:838
drainLoop N
reactor.core.publisher.FluxFlatMap$FlatMapMain:600
drain N
reactor.core.publisher.FluxFlatMap$FlatMapMain:580
onComplete N
reactor.core.publisher.FluxFlatMap$FlatMapMain:457
checkTerminated N
reactor.core.publisher.FluxFlatMap$FlatMapMain:838
drainLoop N
reactor.core.publisher.FluxFlatMap$FlatMapMain:600
innerComplete N
reactor.core.publisher.FluxFlatMap$FlatMapMain:909
onComplete N
reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
onComplete N
reactor.core.publisher.FluxMap$MapSubscriber:136
onComplete N
reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989
onComplete N
reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
complete N
reactor.core.publisher.FluxCreate$BaseSink:438
drain N
reactor.core.publisher.FluxCreate$BufferAsyncSink:784
complete N
reactor.core.publisher.FluxCreate$BufferAsyncSink:732
drainLoop N
reactor.core.publisher.FluxCreate$SerializedSink:239
drain N
reactor.core.publisher.FluxCreate$SerializedSink:205
complete N
reactor.core.publisher.FluxCreate$SerializedSink:196
apiCallComplete N
com.linbit.linstor.netcom.TcpConnectorPeer:455
handleComplete N
com.linbit.linstor.proto.CommonMessageProcessor:363
handleDataMessage N
com.linbit.linstor.proto.CommonMessageProcessor:287
doProcessInOrderMessage N
com.linbit.linstor.proto.CommonMessageProcessor:235
lambda$doProcessMessage$3 N
com.linbit.linstor.proto.CommonMessageProcessor:220
subscribe N
reactor.core.publisher.FluxDefer:46
subscribe N
reactor.core.publisher.Flux:8325
onNext N
reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N
reactor.core.publisher.UnicastProcessor:286
drain N
reactor.core.publisher.UnicastProcessor:322
onNext N
reactor.core.publisher.UnicastProcessor:401
next N
reactor.core.publisher.FluxCreate$IgnoreSink:618
next N
reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N
com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage N
com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N
com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N
reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N
reactor.core.scheduler.WorkerTask:84
call N
reactor.core.scheduler.WorkerTask:37
run N
java.util.concurrent.FutureTask:264
run N
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N
java.util.concurrent.ThreadPoolExecutor:1128
run N
java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:834
END OF ERROR REPORT.
> > 1) Why did I end up in this state? I assume something was configured
> on castle/san5/san6 but not on san7.
>
> Not sure... If something would be broken on san7, you should also have
> gotten an error report from a satellite. The ones you showed here are
> all created by the controller (error-ids XXX-00000-YYY are always
> controller-errors, satellite errors would also have some other
> "random-looking" number instead of the -00000- part)
>
> > 2) How can I fix it?
>
> If I cannot recreate it, there is not much I can do. You could of
> course try restarting the controller, that will reload the data from
> the database, which might fix things... I would be still curious what
> caused all of this...
Sure, will see if I can work it out. From the 0 error, it looks like I
created some configuration in a name DfltStorPool, and probably this was
not replicated to san7 (because san7 didn't exist back then). I'm not
sure whether I would expect this to automatically be copied to the node
if/when it is required, or if I should get an error saying the resource
can't be deployed due to a missing dependency, but I suspect it
shouldn't crash the way it is at the moment...
OK, so I did restart the controller, and now linstor volume list returns
this:
╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊
Allocated ┊ InUse ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ castle ┊ testvm1 ┊ pool_hdd ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm1 ┊ pool_hdd ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm1 ┊ pool_hdd ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm1 ┊ pool_hdd ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm2 ┊ pool_hdd ┊ 0 ┊ 1002 ┊ /dev/drbd1002 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm2 ┊ pool_hdd ┊ 0 ┊ 1002 ┊ /dev/drbd1002 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm2 ┊ pool_hdd ┊ 0 ┊ 1002 ┊ /dev/drbd1002 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm2 ┊ pool_hdd ┊ 0 ┊ 1002 ┊ /dev/drbd1002 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm3 ┊ pool_hdd ┊ 0 ┊ 1003 ┊ /dev/drbd1003 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm3 ┊ pool_hdd ┊ 0 ┊ 1003 ┊ /dev/drbd1003 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm3 ┊ pool_hdd ┊ 0 ┊ 1003 ┊ /dev/drbd1003 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm3 ┊ pool_hdd ┊ 0 ┊ 1003 ┊ /dev/drbd1003 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm4 ┊ pool_hdd ┊ 0 ┊ 1004 ┊ /dev/drbd1004 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm4 ┊ pool_hdd ┊ 0 ┊ 1004 ┊ /dev/drbd1004 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm4 ┊ pool_hdd ┊ 0 ┊ 1004 ┊ /dev/drbd1004 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm4 ┊ pool_hdd ┊ 0 ┊ 1004 ┊ /dev/drbd1004 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm5 ┊ pool_hdd ┊ 0 ┊ 1005 ┊ /dev/drbd1005 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm5 ┊ pool_hdd ┊ 0 ┊ 1005 ┊ /dev/drbd1005 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm5 ┊ pool_hdd ┊ 0 ┊ 1005 ┊ /dev/drbd1005 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm5 ┊ pool_hdd ┊ 0 ┊ 1005 ┊ /dev/drbd1005 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm6 ┊ pool_hdd ┊ 0 ┊ 1006 ┊ /dev/drbd1006 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm6 ┊ pool_hdd ┊ 0 ┊ 1006 ┊ /dev/drbd1006 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm6 ┊ pool_hdd ┊ 0 ┊ 1006 ┊ /dev/drbd1006 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm6 ┊ pool_hdd ┊ 0 ┊ 1006 ┊ /dev/drbd1006 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm7 ┊ pool_hdd ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm7 ┊ pool_hdd ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm7 ┊ pool_hdd ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm7 ┊ pool_hdd ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm8 ┊ pool_hdd ┊ 0 ┊ 1008 ┊ /dev/drbd1008 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm8 ┊ pool_hdd ┊ 0 ┊ 1008 ┊ /dev/drbd1008 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm8 ┊ pool_hdd ┊ 0 ┊ 1008 ┊ /dev/drbd1008 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm8 ┊ pool_hdd ┊ 0 ┊ 1008 ┊ /dev/drbd1008 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm9 ┊ pool_hdd ┊ 0 ┊ 1009 ┊ /dev/drbd1009 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm9 ┊ pool_hdd ┊ 0 ┊ 1009 ┊ /dev/drbd1009 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm9 ┊ pool_hdd ┊ 0 ┊ 1009 ┊ /dev/drbd1009 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm9 ┊ pool_hdd ┊ 0 ┊ 1009 ┊ /dev/drbd1009 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ windows-wm ┊ pool_hdd ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊
49.16 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ windows-wm ┊ pool_hdd ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊
49.16 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ windows-wm ┊ pool_hdd ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊
49.16 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ windows-wm ┊ pool_hdd ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊
49.16 MiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯
So, it would appear that it did actually deploy windows-wm to san7, and
looks like it's all working again. Though I'm still rather unsure about
my process, that with all my testing I might end up with an unstable
system due to old config/testing bits left over.
To completely "reset" linstor, I've so far found the following configs:
/etc/linstor
/etc/drbd.d
/var/lib/linstor
/var/lib/linstor.d
/var/log/linstor
Plus I assume whatever storage spaces have been configured. Is there
anything else that should be wiped to ensure I am starting with a clean
slate? I'd rather not format the whole system and re-install ...
Thanks,
Adam
More information about the drbd-user
mailing list