DataStoreJobCoordinator java.lang.NullPointerException: at index 0
Description
Environment
Operating System: All
Platform: All
Attachments
- 04 Oct 2017, 12:20 AM
Activity
Michael Vorburger December 12, 2018 at 12:27 PM
FTR: Something related to this has just come up again in https://git.opendaylight.org/gerrit/#/c/78678/5/jobcoordinator/jobcoordinator-impl/src/main/java/org/opendaylight/infrautils/jobcoordinator/internal/JobCoordinatorImpl.java@416 ...
Michael Vorburger November 22, 2017 at 3:01 PM
> It looks like there is some mix up which makes no sense to me either; I'll propose something about this in 1-2 follow-up changes.
Michael Vorburger November 22, 2017 at 11:06 AM
PS: Not directly related to this issue, so to be followed-up in separately raised future changes, on staring more at code around this I have to agree with this comment from @Tom Pantelis in the Commit Message of c/65821, quote:
The futures returned from elanInterfaceManager.removeElanInterface
were also added to the List but I don't see the purpose of this. removeElanInterface
waits for those tx's to complete. Maybe it was so the JC would retry them? If so, this
entire code path has a mix of sync and async transactions which is a bit confusing -
some are submitted to the JC, ohers aren't; some may be retried, others aren't.
It looks like there is some mix up which makes no sense to me either; I'll propose something about this in 1-2 follow-up changes.
Michael Vorburger November 22, 2017 at 11:06 AM
> I'll submit a patch to address this ...
==> https://git.opendaylight.org/gerrit/#/c/65821/
FTR to anyone else re-reviewing this in the future: The important fix is the ditching of the outside List<ListenableFuture<Void>> futures
usage inside the forEach
in remove()
, because that multi-threaded use was badly wrong: Replacing of the futures.add(writeConfigTxn.submit());
by return Collections.singletonList(writeConfigTxn.submit());
which is perfectly fine, as the goal is for the JC to handle that list of futures, not to have such a list spanning accross the forEach
. However the removal of the futures.addAll()
of what elanInterfaceManager.removeElanInterface()
returns is a kind of unrelated minor clean up fall out - that part is not what actually fixes the concurrency issue here. I'm just writing this down here because it initially confused me while reviewing it.
@Sam Hague can you confirm that you're good with @Tom Pantelis fixing the problem originally raised in this issue - OK to close?
Tom Pantelis November 21, 2017 at 10:32 PMEdited
The futures List in remove is reused across the submitted jobs in the forEach loop:
DataStoreJobCoordinator dataStoreJobCoordinator = DataStoreJobCoordinator.getInstance();
ElanUtils.removeAndGetElanInterfaces(elanName).forEach(elanInterfaceName -> {
dataStoreJobCoordinator.enqueueJob(ElanUtils.getElanInterfaceJobKey(elanInterfaceName), () -> {
WriteTransaction writeConfigTxn = broker.newWriteOnlyTransaction();
...
futures.add(writeConfigTxn.submit());
return futures;
}, ElanConstants.JOB_MAX_RETRIES);
});
This looks very suspicious and unsafe as the List is accessed by multiple threads. I think this is what causes the strange behavior with a null element in the list - tx.submit() never returns null. Each job should return it's own local futures List. I'll submit a patch to address this ...
https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/105/odl1_karaf.log.gz
2017-10-03 20:15:53,092 | INFO | nPool-1-worker-2 | ElanInstanceManager | 341 - org.opendaylight.netvirt.elanmanager-impl - 0.4.2.SNAPSHOT | Deleting the elanInterface present under ConfigDS:97fe230c-b83b-4bb4-bfc8-af085a471e92
2017-10-03 20:15:53,092 | INFO | nPool-1-worker-2 | ElanInstanceManager | 341 - org.opendaylight.netvirt.elanmanager-impl - 0.4.2.SNAPSHOT | unbind the Interface:97fe230c-b83b-4bb4-bfc8-af085a471e92 service bounded to Elan:8f9ddf92-e04d-4e60-81c0-6bc8860d788b
2017-10-03 20:15:53,092 | ERROR | nPool-1-worker-1 | DataStoreJobCoordinator | 292 - org.opendaylight.genius.mdsalutil-api - 0.2.2.SNAPSHOT | Thread terminated due to uncaught exception: ForkJoinPool-1-worker-1
java.lang.NullPointerException: at index 0
at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:240)[38:com.google.guava:18.0.0]
at com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:231)[38:com.google.guava:18.0.0]
at com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:226)[38:com.google.guava:18.0.0]
at com.google.common.collect.ImmutableList.construct(ImmutableList.java:303)[38:com.google.guava:18.0.0]
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)[38:com.google.guava:18.0.0]
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:226)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Futures.allAsList(Futures.java:1027)[38:com.google.guava:18.0.0]
at org.opendaylight.genius.datastoreutils.DataStoreJobCoordinator$MainTask.run(DataStoreJobCoordinator.java:297)[292:org.opendaylight.genius.mdsalutil-api:0.2.2.SNAPSHOT]
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)[:1.8.0_141]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)[:1.8.0_141]
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)[:1.8.0_141]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)[:1.8.0_141]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)[:1.8.0_141]