DataStoreJobCoordinator java.lang.NullPointerException: at index 0

Description

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/105/odl1_karaf.log.gz

2017-10-03 20:15:53,092 | INFO | nPool-1-worker-2 | ElanInstanceManager | 341 - org.opendaylight.netvirt.elanmanager-impl - 0.4.2.SNAPSHOT | Deleting the elanInterface present under ConfigDS:97fe230c-b83b-4bb4-bfc8-af085a471e92
2017-10-03 20:15:53,092 | INFO | nPool-1-worker-2 | ElanInstanceManager | 341 - org.opendaylight.netvirt.elanmanager-impl - 0.4.2.SNAPSHOT | unbind the Interface:97fe230c-b83b-4bb4-bfc8-af085a471e92 service bounded to Elan:8f9ddf92-e04d-4e60-81c0-6bc8860d788b
2017-10-03 20:15:53,092 | ERROR | nPool-1-worker-1 | DataStoreJobCoordinator | 292 - org.opendaylight.genius.mdsalutil-api - 0.2.2.SNAPSHOT | Thread terminated due to uncaught exception: ForkJoinPool-1-worker-1
java.lang.NullPointerException: at index 0
at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:240)[38:com.google.guava:18.0.0]
at com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:231)[38:com.google.guava:18.0.0]
at com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:226)[38:com.google.guava:18.0.0]
at com.google.common.collect.ImmutableList.construct(ImmutableList.java:303)[38:com.google.guava:18.0.0]
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)[38:com.google.guava:18.0.0]
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:226)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Futures.allAsList(Futures.java:1027)[38:com.google.guava:18.0.0]
at org.opendaylight.genius.datastoreutils.DataStoreJobCoordinator$MainTask.run(DataStoreJobCoordinator.java:297)[292:org.opendaylight.genius.mdsalutil-api:0.2.2.SNAPSHOT]
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)[:1.8.0_141]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)[:1.8.0_141]
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)[:1.8.0_141]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)[:1.8.0_141]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)[:1.8.0_141]

Environment

Operating System: All
Platform: All

Attachments

1
  • 04 Oct 2017, 12:20 AM

Activity

Show:

Michael Vorburger November 22, 2017 at 3:01 PM

> It looks like there is some mix up which makes no sense to me either; I'll propose something about this in 1-2 follow-up changes.

https://git.opendaylight.org/gerrit/#/c/65839/

Michael Vorburger November 22, 2017 at 11:06 AM

PS: Not directly related to this issue, so to be followed-up in separately raised future changes, on staring more at code around this I have to agree with this comment from in the Commit Message of c/65821, quote:

The futures returned from elanInterfaceManager.removeElanInterface
were also added to the List but I don't see the purpose of this. removeElanInterface
waits for those tx's to complete. Maybe it was so the JC would retry them? If so, this
entire code path has a mix of sync and async transactions which is a bit confusing -
some are submitted to the JC, ohers aren't; some may be retried, others aren't.

It looks like there is some mix up which makes no sense to me either; I'll propose something about this in 1-2 follow-up changes.

Michael Vorburger November 22, 2017 at 11:06 AM

> I'll submit a patch to address this ...

==> https://git.opendaylight.org/gerrit/#/c/65821/

FTR to anyone else re-reviewing this in the future: The important fix is the ditching of the outside List<ListenableFuture<Void>> futures usage inside the forEach in remove(), because that multi-threaded use was badly wrong: Replacing of the futures.add(writeConfigTxn.submit()); by return Collections.singletonList(writeConfigTxn.submit()); which is perfectly fine, as the goal is for the JC to handle that list of futures, not to have such a list spanning accross the forEach. However the removal of the futures.addAll() of what elanInterfaceManager.removeElanInterface() returns is a kind of unrelated minor clean up fall out - that part is not what actually fixes the concurrency issue here. I'm just writing this down here because it initially confused me while reviewing it.

can you confirm that you're good with fixing the problem originally raised in this issue - OK to close?

Tom Pantelis November 21, 2017 at 10:32 PM
Edited

The futures List in remove is reused across the submitted jobs in the forEach loop:

DataStoreJobCoordinator dataStoreJobCoordinator = DataStoreJobCoordinator.getInstance(); ElanUtils.removeAndGetElanInterfaces(elanName).forEach(elanInterfaceName -> { dataStoreJobCoordinator.enqueueJob(ElanUtils.getElanInterfaceJobKey(elanInterfaceName), () -> { WriteTransaction writeConfigTxn = broker.newWriteOnlyTransaction(); ... futures.add(writeConfigTxn.submit()); return futures; }, ElanConstants.JOB_MAX_RETRIES); });

This looks very suspicious and unsafe as the List is accessed by multiple threads. I think this is what causes the strange behavior with a null element in the list - tx.submit() never returns null. Each job should return it's own local futures List. I'll submit a patch to address this ...

Done

Details

Assignee

Reporter

External issue ID

Priority (migrated 2)

High

Components

Affects versions

Priority

Created October 4, 2017 at 12:17 AM
Updated December 12, 2018 at 12:27 PM
Resolved November 28, 2017 at 5:46 PM