vpnmanager: OptimisticLockFailedException seen during rally scenario

Description

In a Clustered setup with 3 ODL Controllers and 3 Compute nodes, while running rally with a light load of concurrency of 2 and times set to 40, the following exception was seen in karaf.

ODL rpm used: opendaylight-6.2.0-0.1.20170913snap58.el7.noarch.rpm

2017-09-15 03:39:25,801 | WARN | CommitFutures-3 | DataStoreJobCoordinator | 319 - org.opendaylight.genius.mdsalutil-api - 0.2.2.SNAPSHOT | Job: JobEntry {key='VPNNODE-9681529007732', mainWorker=org.opendaylight.netvirt.vpnmanager.VpnNodeListener$1@7a2b1deb, rollbackWorker=null, retryCount=0, futures=[org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$AsyncNotifyingSettableFuture@74ececde]} failed OptimisticLockFailedException {message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was created by other transaction.]]} at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:751) at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:789) at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:736) at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:819) at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:90) at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97) at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236) at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleBatchedModifications(ShardCommitCoordinator.java:170) at org.opendaylight.controller.cluster.datastore.Shard.handleBatchedModificationsLocal(Shard.java:604) at org.opendaylight.controller.cluster.datastore.Shard.handleBatchedModifications(Shard.java:627) at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:312) at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270) at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44) at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170) at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104) at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544) at akka.actor.Actor$class.aroundReceive(Actor.scala:502) at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168) at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:727) at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183) at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was created by other transaction. at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkWriteApplicable(SchemaAwareApplyOperation.java:174) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:128) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131) at org.opendaylight.yangtools.yang.data.impl.schema.tree.StructuralContainerModificationStrategy.checkApplicable(StructuralContainerModificationStrategy.java:99) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:288) at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:125) at org.opendaylight.yangtools.yang.data.impl.schema.tree.RootModificationApplyOperation.checkApplicable(RootModificationApplyOperation.java:72) at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractDataTreeTip.validate(AbstractDataTreeTip.java:35) at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:743) ... 29 more

 

Environment

Operating System: All
Platform: All

Activity

Show:

Sam Hague April 5, 2018 at 9:22 PM

Michael Vorburger March 14, 2018 at 10:28 AM

can this one finally be closed now, or is there anything else I can do here?

Michael Vorburger February 26, 2018 at 6:27 PM

That (above) is correct. FTR c/66618 which changed the default retries in JobCoordinator from 0 to 3 globally for everything was discussed, accepted and merged on Jan 9th.

Perhaps we could finally close this bug (with Fix Version: Oxygen), and open new ones for other places?

Sam Hague December 21, 2017 at 5:16 AM

Ok so the fact we don't see any errors in the VpnNodeListener patch means somewhere in the 1-3 retries it passed - and failures would only be logged at debug, which is not on by default, so we don't know how many tries it took, but it did pass.

Then we need to see if we want to do the same for the new error. And in this one, it says retries =0 since that is the default - and not that is got through all retries and ended at 0.

Michael Vorburger December 19, 2017 at 4:51 PM
Edited

we only got a retryCount=0 - does that mean the patch did what it was supposed to and only log the final exception after the last retry? The bug is still there, but the number of logs was reduced?

Now wait a second, your new log above from JobCoordinatorImpl is re. ItmTepRemoveWorker.. whereas originally this issue was re. VpnNodeListener... so what is the "scope" of this bug? 😉 The change above only fixed VpnNodeListener, by adding a ,3 parameter for retries, which we proved helped.

Now you either have to do the same in ItmTepRemoveWorker, and everywhere else, or we change the default retries in JobCoordinator from 0 to 3 for EVERYTHING... I'm not sure that everyone will agree that is a good idea, but I can propose it, for discussion, in c/66618 ...

Also can you clarify that your patches here - they don't fix this bug, they just reduce the number of logs? Do I have that right?

Well, no careful, you mix up 2 things here: Those 3 retries added e.g. to VpnNodeListener, or maybe by changing the default in JobCoordinator, DO fix this bug. But then the https://lf-opendaylight.atlassian.net/browse/CONTROLLER-1802#icft=CONTROLLER-1802 improves logging to reduce where it does not make sense to log (because there are retries), that's what c/66362 does for controller (merged on Dec 13th but still appears as WARN from ConcurrentDOMDataBroker because your log above is from a Dec 12th run!) and c/66355 forJobCoordinator (finally just merged, but likely buggy; follow up in c/66634).

Done

Details

Assignee

Reporter

Labels

External issue ID

Components

Fix versions

Affects versions

Priority

Created September 18, 2017 at 12:38 PM
Updated April 5, 2018 at 9:22 PM
Resolved April 5, 2018 at 9:22 PM