[DRBD-user] linstor Error after adding resource to node
Adam Goryachev
mailinglists at websitemanagers.com.au
Tue Sep 29 17:14:27 CEST 2020
I'm still progressively testing linstor on a group of test servers, and
now came across a new problem.
I had everything working nicely with 4 x nodes, linstor n l showed all 4
working, machines would automatically come back after a reboot, and it
all looked good. I deleted all the storage-pools and re-created them
(same name across all servers), and then followed the docs to create my
first resource-definition, volume-definition and then used
auto-placement on 3 nodes.
I then decided to get clever, and started creating another 9 tests with
auto-placement on 4 nodes. All of this worked well, and I got a status
like this:
╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊
Allocated ┊ InUse ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ castle ┊ testvm1 ┊ pool_hdd ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm1 ┊ pool_hdd ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm1 ┊ pool_hdd ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm1 ┊ pool_hdd ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm2 ┊ pool_hdd ┊ 0 ┊ 1002 ┊ /dev/drbd1002 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm2 ┊ pool_hdd ┊ 0 ┊ 1002 ┊ /dev/drbd1002 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm2 ┊ pool_hdd ┊ 0 ┊ 1002 ┊ /dev/drbd1002 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm2 ┊ pool_hdd ┊ 0 ┊ 1002 ┊ /dev/drbd1002 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm3 ┊ pool_hdd ┊ 0 ┊ 1003 ┊ /dev/drbd1003 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm3 ┊ pool_hdd ┊ 0 ┊ 1003 ┊ /dev/drbd1003 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm3 ┊ pool_hdd ┊ 0 ┊ 1003 ┊ /dev/drbd1003 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm3 ┊ pool_hdd ┊ 0 ┊ 1003 ┊ /dev/drbd1003 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm4 ┊ pool_hdd ┊ 0 ┊ 1004 ┊ /dev/drbd1004 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm4 ┊ pool_hdd ┊ 0 ┊ 1004 ┊ /dev/drbd1004 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm4 ┊ pool_hdd ┊ 0 ┊ 1004 ┊ /dev/drbd1004 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm4 ┊ pool_hdd ┊ 0 ┊ 1004 ┊ /dev/drbd1004 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm5 ┊ pool_hdd ┊ 0 ┊ 1005 ┊ /dev/drbd1005 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm5 ┊ pool_hdd ┊ 0 ┊ 1005 ┊ /dev/drbd1005 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm5 ┊ pool_hdd ┊ 0 ┊ 1005 ┊ /dev/drbd1005 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm5 ┊ pool_hdd ┊ 0 ┊ 1005 ┊ /dev/drbd1005 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm6 ┊ pool_hdd ┊ 0 ┊ 1006 ┊ /dev/drbd1006 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm6 ┊ pool_hdd ┊ 0 ┊ 1006 ┊ /dev/drbd1006 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm6 ┊ pool_hdd ┊ 0 ┊ 1006 ┊ /dev/drbd1006 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm6 ┊ pool_hdd ┊ 0 ┊ 1006 ┊ /dev/drbd1006 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm7 ┊ pool_hdd ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm7 ┊ pool_hdd ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm7 ┊ pool_hdd ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm7 ┊ pool_hdd ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm8 ┊ pool_hdd ┊ 0 ┊ 1008 ┊ /dev/drbd1008 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm8 ┊ pool_hdd ┊ 0 ┊ 1008 ┊ /dev/drbd1008 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm8 ┊ pool_hdd ┊ 0 ┊ 1008 ┊ /dev/drbd1008 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm8 ┊ pool_hdd ┊ 0 ┊ 1008 ┊ /dev/drbd1008 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ testvm9 ┊ pool_hdd ┊ 0 ┊ 1009 ┊ /dev/drbd1009 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ testvm9 ┊ pool_hdd ┊ 0 ┊ 1009 ┊ /dev/drbd1009 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ testvm9 ┊ pool_hdd ┊ 0 ┊ 1009 ┊ /dev/drbd1009 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ san7 ┊ testvm9 ┊ pool_hdd ┊ 0 ┊ 1009 ┊ /dev/drbd1009 ┊
102.42 MiB ┊ Unused ┊ UpToDate ┊
┊ castle ┊ windows-wm ┊ pool_hdd ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊
49.16 MiB ┊ Unused ┊ UpToDate ┊
┊ san5 ┊ windows-wm ┊ pool_hdd ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊
49.16 MiB ┊ Unused ┊ UpToDate ┊
┊ san6 ┊ windows-wm ┊ pool_hdd ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊
49.16 MiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯
You can see at the end the windows-wm is only on 3 nodes, while the
other testvm are on 4. So I ran this command to add the windows-wm to
the 4th node:
linstor resource create san7 windows-wm --storage-pool pool_hdd
SUCCESS:
Successfully set property key(s): StorPoolName
SUCCESS:
Description:
New resource 'windows-wm' on node 'san7' registered.
Details:
Resource 'windows-wm' on node 'san7' UUID is:
50e34cac-6702-45c4-b242-b644415a7644
SUCCESS:
Description:
Volume with number '0' on resource 'windows-wm' on node 'san7'
successfully registered
Details:
Volume UUID is: f6529463-03ac-4b31-af69-67b3153de355
SUCCESS:
Added peer(s) 'san7' to resource 'windows-wm' on 'san6'
SUCCESS:
Added peer(s) 'san7' to resource 'windows-wm' on 'castle'
SUCCESS:
Added peer(s) 'san7' to resource 'windows-wm' on 'san5'
SUCCESS:
Created resource 'windows-wm' on 'san7'
SUCCESS:
Description:
Resource 'windows-wm' on 'san7' ready
Details:
Node(s): 'san7', Resource: 'windows-wm'
This looked promising, and seemed to work, so I ran another "linstor
volume list" but got an error:
linstor volume list
ERROR:
Show reports:
linstor error-reports show 5F733CD9-00000-000004
The contents of that are below:
ERROR REPORT 5F733CD9-00000-000004
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time: 2020-09-23T10:27:49+00:00
Error time: 2020-09-30 00:13:34
Node: castle.websitemanagers.com.au
============================================================
Reported error:
===============
Category: RuntimeException
Class name: NullPointerException
Class canonical name: java.lang.NullPointerException
Generated at: Method 'deviceProviderKindAsString',
Source file 'Json.java', Line #73
Call backtrace:
Method Native Class:Line number
deviceProviderKindAsString N
com.linbit.linstor.api.rest.v1.serializer.Json:73
apiToVolume N
com.linbit.linstor.api.rest.v1.serializer.Json:664
lambda$apiToResourceWithVolumes$2 N
com.linbit.linstor.api.rest.v1.serializer.Json:477
accept N
java.util.stream.ReferencePipeline$3$1:195
forEachRemaining N
java.util.ArrayList$ArrayListSpliterator:1655
copyInto N
java.util.stream.AbstractPipeline:484
wrapAndCopyInto N
java.util.stream.AbstractPipeline:474
evaluateSequential N
java.util.stream.ReduceOps$ReduceOp:913
evaluate N
java.util.stream.AbstractPipeline:234
collect N
java.util.stream.ReferencePipeline:578
apiToResourceWithVolumes N
com.linbit.linstor.api.rest.v1.serializer.Json:506
lambda$listVolumesApiCallRcWithToResponse$1 N
com.linbit.linstor.api.rest.v1.View:112
accept N
java.util.stream.ReferencePipeline$3$1:195
forEachRemaining N
java.util.ArrayList$ArrayListSpliterator:1655
copyInto N
java.util.stream.AbstractPipeline:484
wrapAndCopyInto N
java.util.stream.AbstractPipeline:474
evaluateSequential N
java.util.stream.ReduceOps$ReduceOp:913
evaluate N
java.util.stream.AbstractPipeline:234
collect N
java.util.stream.ReferencePipeline:578
lambda$listVolumesApiCallRcWithToResponse$2 N
com.linbit.linstor.api.rest.v1.View:113
onNext N
reactor.core.publisher.FluxFlatMap$FlatMapMain:378
onNext N
reactor.core.publisher.FluxContextStart$ContextStartSubscriber:96
onNext N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
onNext N
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:385
onNext N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
request N
reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribeInner N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:143
onSubscribe N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:237
trySubscribeScalarMap N
reactor.core.publisher.FluxFlatMap:191
subscribeOrReturn N
reactor.core.publisher.MonoFlatMapMany:49
subscribe N
reactor.core.publisher.Flux:8311
onNext N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N
reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribe N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N
reactor.core.publisher.MonoCurrentContext:35
subscribe N
reactor.core.publisher.Flux:8325
onNext N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
onNext N
reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121
complete N
reactor.core.publisher.Operators$MonoSubscriber:1755
onComplete N
reactor.core.publisher.MonoStreamCollector$StreamCollectorSubscriber:167
onComplete N
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395
onComplete N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
checkTerminated N
reactor.core.publisher.FluxFlatMap$FlatMapMain:838
drainLoop N
reactor.core.publisher.FluxFlatMap$FlatMapMain:600
innerComplete N
reactor.core.publisher.FluxFlatMap$FlatMapMain:909
onComplete N
reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
onComplete N
reactor.core.publisher.FluxMap$MapSubscriber:136
onComplete N
reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989
onComplete N
reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
complete N
reactor.core.publisher.FluxCreate$BaseSink:438
drain N
reactor.core.publisher.FluxCreate$BufferAsyncSink:784
complete N
reactor.core.publisher.FluxCreate$BufferAsyncSink:732
drainLoop N
reactor.core.publisher.FluxCreate$SerializedSink:239
drain N
reactor.core.publisher.FluxCreate$SerializedSink:205
complete N
reactor.core.publisher.FluxCreate$SerializedSink:196
apiCallComplete N
com.linbit.linstor.netcom.TcpConnectorPeer:455
handleComplete N
com.linbit.linstor.proto.CommonMessageProcessor:363
handleDataMessage N
com.linbit.linstor.proto.CommonMessageProcessor:287
doProcessInOrderMessage N
com.linbit.linstor.proto.CommonMessageProcessor:235
lambda$doProcessMessage$3 N
com.linbit.linstor.proto.CommonMessageProcessor:220
subscribe N
reactor.core.publisher.FluxDefer:46
subscribe N
reactor.core.publisher.Flux:8325
onNext N
reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N
reactor.core.publisher.UnicastProcessor:286
drain N
reactor.core.publisher.UnicastProcessor:322
onNext N
reactor.core.publisher.UnicastProcessor:401
next N
reactor.core.publisher.FluxCreate$IgnoreSink:618
next N
reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N
com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage N
com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N
com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N
reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N
reactor.core.scheduler.WorkerTask:84
call N
reactor.core.scheduler.WorkerTask:37
run N
java.util.concurrent.FutureTask:264
run N
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N
java.util.concurrent.ThreadPoolExecutor:1128
run N
java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:834
END OF ERROR REPORT.
Since it looks relevant, error reports 1, 2 and 3 are all similar for
nodes castle, san5 and san6 (note san7 was the 4th/newest node where I
tried to add the resource to it). Error report is:
ERROR REPORT 5F733CD9-00000-000001
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time: 2020-09-23T10:27:49+00:00
Error time: 2020-09-30 00:13:01
Node: castle.websitemanagers.com.au
Peer: Node: 'san6'
============================================================
Reported error:
===============
Category: Error
Class name: ImplementationError
Class canonical name: com.linbit.ImplementationError
Generated at: Method 'createStorageRscData',
Source file 'CtrlRscLayerDataMerger.java', Line #214
Error message: Received unknown storage resource
from satellite
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Execute single-stage API NotifyRscApplied
|_ checkpoint ⇢ Fallback error handling wrapper
Stack trace:
Call backtrace:
Method Native Class:Line number
createStorageRscData N
com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:214
Suppressed exception 1 of 1:
===============
Category: RuntimeException
Class name: OnAssemblyException
Class canonical name:
reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at: Method 'createStorageRscData',
Source file 'CtrlRscLayerDataMerger.java', Line #214
Error message:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Execute single-stage API NotifyRscApplied
|_ checkpoint ⇢ Fallback error handling wrapper
Stack trace:
Call backtrace:
Method Native Class:Line number
createStorageRscData N
com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:214
createStorageRscData N
com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:65
mergeStorageRscData N
com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:285
merge N
com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:138
merge N
com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:148
mergeLayerData N
com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:93
mergeLayerData N
com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:82
updateVolume N
com.linbit.linstor.core.apicallhandler.controller.internal.RscInternalCallHandler:228
execute N
com.linbit.linstor.api.protobuf.internal.NotifyResourceApplied:45
executeNonReactive N
com.linbit.linstor.proto.CommonMessageProcessor:525
lambda$execute$13 N
com.linbit.linstor.proto.CommonMessageProcessor:500
doInScope N
com.linbit.linstor.core.apicallhandler.ScopeRunner:147
lambda$fluxInScope$0 N
com.linbit.linstor.core.apicallhandler.ScopeRunner:75
call N
reactor.core.publisher.MonoCallable:91
trySubscribeScalarMap N
reactor.core.publisher.FluxFlatMap:126
subscribeOrReturn N
reactor.core.publisher.MonoFlatMapMany:49
subscribe N
reactor.core.publisher.Flux:8311
onNext N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N
reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribe N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N
reactor.core.publisher.MonoCurrentContext:35
subscribe N
reactor.core.publisher.InternalFluxOperator:62
subscribe N
reactor.core.publisher.FluxDefer:54
subscribe N
reactor.core.publisher.Flux:8325
onNext N
reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N
reactor.core.publisher.UnicastProcessor:286
drain N
reactor.core.publisher.UnicastProcessor:322
onNext N
reactor.core.publisher.UnicastProcessor:401
next N
reactor.core.publisher.FluxCreate$IgnoreSink:618
next N
reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N
com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage N
com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N
com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N
reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N
reactor.core.scheduler.WorkerTask:84
call N
reactor.core.scheduler.WorkerTask:37
run N
java.util.concurrent.FutureTask:264
run N
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N
java.util.concurrent.ThreadPoolExecutor:1128
run N
java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:834
END OF ERROR REPORT.
So, questions:
1) Why did I end up in this state? I assume something was configured on
castle/san5/san6 but not on san7.
2) How can I fix it?
Thanks,
Adam
More information about the drbd-user
mailing list