Inject DataBroker only when all shards have leaders
Description
is blocked by
relates to
Activity

Robert Varga July 31, 2020 at 1:29 PM
Most of the refactoring is done, such that core services do not really rely on blueprint. Most notably AbstractDataStore is started asynchronously and published to Service Registry only after shards have settled. There is still a notable gap of Distributed EOS, which does not wait for settle:
2020-07-31T15:18:16,998 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-17 | OSGiDistributedEntityOwnershipService | 139 - org.opendaylight.controller.sal-distributed-eos - 2.0.4.SNAPSHOT | Distributed Entity Ownership Service starting
2020-07-31T15:18:17,009 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-17 | OSGiDistributedEntityOwnershipService | 139 - org.opendaylight.controller.sal-distributed-eos - 2.0.4.SNAPSHOT | Distributed Entity Ownership Service started
2020-07-31T15:18:17,012 | INFO | opendaylight-cluster-data-shard-dispatcher-48 | EntityOwnershipShard | 136 - org.opendaylight.controller.sal-clustering-commons - 2.0.4.SNAPSHOT | Shard created : member-1-shard-entity-ownership-operational, persistent : false
2020-07-31T15:18:17,013 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-28 | DistributedEntityOwnershipService | 139 - org.opendaylight.controller.sal-distributed-eos - 2.0.4.SNAPSHOT | Successfully created entity-ownership shard
2020-07-31T15:18:17,014 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-28 | RoleChangeNotifier | 136 - org.opendaylight.controller.sal-clustering-commons - 2.0.4.SNAPSHOT | RoleChangeNotifier:akka.tcp://opendaylight-cluster-data@127.0.0.1:2550/user/shardmanager-operational/member-1-s
hard-entity-ownership-operational/member-1-shard-entity-ownership-operational-notifier#533402702 created and ready for shard:member-1-shard-entity-ownership-operational
2020-07-31T15:18:17,018 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-17 | OSGiDOMDataBroker | 138 - org.opendaylight.controller.sal-distributed-datastore - 2.0.4.SNAPSHOT | DOM Data Broker starting
2020-07-31T15:18:17,039 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-17 | ConcurrentDOMDataBroker | 185 - org.opendaylight.yangtools.util - 5.0.5 | ThreadFactory created: CommitFutures
2020-07-31T15:18:17,051 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-17 | OSGiDOMDataBroker | 138 - org.opendaylight.controller.sal-distributed-datastore - 2.0.4.SNAPSHOT | DOM Data Broker started
2020-07-31T15:18:17,051 | INFO | opendaylight-cluster-data-shard-dispatcher-48 | EntityOwnershipShard | 136 - org.opendaylight.controller.sal-clustering-commons - 2.0.4.SNAPSHOT | Starting recovery for member-1-shard-entity-ownership-operational with journal batch size 1
2020-07-31T15:18:17,055 | INFO | opendaylight-cluster-data-shard-dispatcher-48 | EntityOwnershipShard | 136 - org.opendaylight.controller.sal-clustering-commons - 2.0.4.SNAPSHOT | member-1-shard-entity-ownership-operational: Recovery completed - Switching actor to Follower - last log index = -1, last l
og term = -1, snapshot index = -1, snapshot term = -1, journal size = 0
2020-07-31T15:18:17,059 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-35 | RoleChangeNotifier | 136 - org.opendaylight.controller.sal-clustering-commons - 2.0.4.SNAPSHOT | RoleChangeNotifier for member-1-shard-entity-ownership-operational , received role change from null to Follower
2020-07-31T15:18:17,060 | INFO | opendaylight-cluster-data-shard-dispatcher-47 | EntityOwnershipShard | 136 - org.opendaylight.controller.sal-clustering-commons - 2.0.4.SNAPSHOT | member-1-shard-entity-ownership-operational (Candidate): Starting new election term 1
2020-07-31T15:18:17,061 | INFO | opendaylight-cluster-data-shard-dispatcher-47 | EntityOwnershipShard | 136 - org.opendaylight.controller.sal-clustering-commons - 2.0.4.SNAPSHOT | member-1-shard-entity-ownership-operational (Follower) :- Switching from behavior Follower to Candidate, election term: 1
i.e. DistributedEntityOwnershipService itself does not wait for settle before being published. This will need to be fixed in a follow-up patch.

Robert Varga April 28, 2020 at 10:24 AM
So the general approach we will take to this is that we will completely eliminate use of BP in mdsal/controller in favor of using OSGi DS. This work will result in the system being decomposed to smaller components (as opposed to BP containers). Each component can (and will) start as soon as it can. There will be no deadlines to activation, as OSGi DS is inherently asynchronous and aligned with OSGi lifecycle.
This will result in services being available as components, and thus the choice of DI framework with be left to downstreams – who can opt to switch to OSGi DS, too, eliminating this need for battling timeouts.

Robert Varga September 16, 2019 at 10:38 PM
There are different trade-offs here and depending on how the applications are wired, and how BP is configured.

Faseela K January 11, 2019 at 4:25 PM
Sure, I am checking all these. We will also work towards adding retries in application side logic.
Please do check if there will be a benefit in making this 2 min timeout increased? Or is it already configurable, so that we can pick a value based on scale of the deployment.

Robert Varga January 11, 2019 at 2:26 PM
This is related to CDS, hence it is not an MD-SAL issue.
Details
Assignee
Robert VargaRobert VargaReporter
Faseela KFaseela KComponents
Fix versions
Priority
Medium
Details
Details
Assignee

Reporter

We are hitting an issue in genius on stable/oxygen, where randomly idmanager-impl bundle does not come up, as the datastore read in the blueprint initialization code was failing with NoShardLeaderException.
https://lists.opendaylight.org/pipermail/genius-dev/2019-January/003554.html
While applications work on putting propoer failure handling, was thinking it would be better if dataBroker is injected only when all shards have leaders, than having a 2 min timeout.
ODL :: genius :: idmanager-impl (260)
-------------------------------------
Status: Failure
Blueprint
1/8/19 10:47 AM
Exception:
org.osgi.service.blueprint.container.ComponentDefinitionException: Error when instantiating bean idManager of class org.opendaylight.genius.idmanager.IdManager org.osgi.service.blueprint.container.ComponentDefinitionException: org.osgi.service.blueprint.container.ComponentDefinitionException: Error when instantiating bean idManager of class org.opendaylight.genius.idmanager.IdManager at org.apache.aries.blueprint.container.ServiceRecipe.createService(ServiceRecipe.java:310) at org.apache.aries.blueprint.container.ServiceRecipe.internalGetService(ServiceRecipe.java:252) at org.apache.aries.blueprint.container.ServiceRecipe.internalCreate(ServiceRecipe.java:149) at org.apache.aries.blueprint.di.AbstractRecipe$1.call(AbstractRecipe.java:79) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:88) at org.apache.aries.blueprint.container.BlueprintRepository.createInstances(BlueprintRepository.java:255) at org.apache.aries.blueprint.container.BlueprintRepository.createAll(BlueprintRepository.java:186) at org.apache.aries.blueprint.container.BlueprintContainerImpl.instantiateEagerComponents(BlueprintContainerImpl.java:704) at org.apache.aries.blueprint.container.BlueprintContainerImpl.doRun(BlueprintContainerImpl.java:410) at org.apache.aries.blueprint.container.BlueprintContainerImpl.run(BlueprintContainerImpl.java:275) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.aries.blueprint.container.ExecutorServiceWrapper.run(ExecutorServiceWrapper.java:106) at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.osgi.service.blueprint.container.ComponentDefinitionException: Error when instantiating bean idManager of class org.opendaylight.genius.idmanager.IdManager at org.apache.aries.blueprint.container.BeanRecipe.wrapAsCompDefEx(BeanRecipe.java:361) at org.apache.aries.blueprint.container.BeanRecipe.getInstanceFromType(BeanRecipe.java:351) at org.apache.aries.blueprint.container.BeanRecipe.getInstance(BeanRecipe.java:282) at org.apache.aries.blueprint.container.BeanRecipe.internalCreate2(BeanRecipe.java:830) at org.apache.aries.blueprint.container.BeanRecipe.internalCreate(BeanRecipe.java:811) at org.apache.aries.blueprint.di.AbstractRecipe$1.call(AbstractRecipe.java:79) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:88) at org.apache.aries.blueprint.di.RefRecipe.internalCreate(RefRecipe.java:62) at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:106) at org.apache.aries.blueprint.container.ServiceRecipe.createService(ServiceRecipe.java:285) ... 21 more Caused by: ReadFailedException{message=Error executeRead ReadData for path /(urn:opendaylight:genius:idmanager?revision=2016-04-06)id-pools, errorList=[RpcError [message=Error executeRead ReadData for path /(urn:opendaylight:genius:idmanager?revision=2016-04-06)id-pools, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard 192.168.70.1-shard-default-config currently has no leader. Try again later.]]} at org.opendaylight.controller.cluster.datastore.NoOpTransactionContext.executeRead(NoOpTransactionContext.java:71) at org.opendaylight.controller.cluster.datastore.TransactionProxy$1.invoke(TransactionProxy.java:98) at org.opendaylight.controller.cluster.datastore.TransactionContextWrapper.executePriorTransactionOperations(TransactionContextWrapper.java:194) at org.opendaylight.controller.cluster.datastore.AbstractTransactionContextFactory.onFindPrimaryShardFailure(AbstractTransactionContextFactory.java:109) at org.opendaylight.controller.cluster.datastore.AbstractTransactionContextFactory.access$100(AbstractTransactionContextFactory.java:37) at org.opendaylight.controller.cluster.datastore.AbstractTransactionContextFactory$1.onComplete(AbstractTransactionContextFactory.java:136) at org.opendaylight.controller.cluster.datastore.AbstractTransactionContextFactory$1.onComplete(AbstractTransactionContextFactory.java:130) at akka.dispatch.OnComplete.internal(Future.scala:260) at akka.dispatch.OnComplete.internal(Future.scala:258) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:185) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard 192.168.70.1-shard-default-config currently has no leader. Try again later.