java.lang.OutOfMemoryError: Java heap space
Description
Environment
Attachments
- 28 Nov 2017, 04:04 PM
- 28 Nov 2017, 04:04 PM
- 28 Nov 2017, 04:04 PM
- 15 Nov 2017, 04:50 PM
- 15 Nov 2017, 04:50 PM
- 15 Nov 2017, 04:50 PM
- 13 Nov 2017, 06:07 PM
- 13 Nov 2017, 05:14 PM
- 13 Nov 2017, 04:51 PM
blocks
is blocked by
relates to
Activity
Michael Vorburger November 30, 2017 at 12:44 PM
Closing this issue to avoid confusing, carrying on more similar work in different places in OVSDB-435.
Michael Vorburger November 29, 2017 at 3:08 PM
Testing of another scenario ("nova boot") has hit an OOM that looks exactly this again (another huge 857 MB of 1.6 GB Map inside the MD SAL ShardDataTree), so clogging these TX leaks is likely going to be more of an ongoing repetitive than a one time action..
Basically, any time we test new scenarios at scale on a path that hasn't been threaded before, if we hit an OOM that shows a blown up ShardDataTree, we have to run trace:transactions
and clog more non-closed TXs - best by using the ManagedTransactionRunner
, continuing on the repetitive pattern of fixes which the patches of the past 2 weeks on the https://git.opendaylight.org/gerrit/#/q/topic:NETVIRT-985 have shown.
Michael Vorburger November 28, 2017 at 4:08 PM
Everything done here is now merged to master & carbon; and nitrogen should get merged in the coming days (stable/nitrogen was re-open'd today). In attached Controller[1-3]_open-transactions.txt from latest test with a build including these fixes shows, we can clearly see that all the "big" leaks (as in with hundreds of open TXs) have been plugged; as expected. Preliminary early feedback indicates that they are not hitting an OOM anymore.
There ARE still some open TX, but relatively few, compared to where we started; I don't think it's worth following up on them, in the short term. In case we ever want to look at this again in the future, see new attachments.
Michael Vorburger November 17, 2017 at 5:03 PMEdited
Once everything that is open now, PLUS what @Stephen Kitt will help me with in https://lf-opendaylight.atlassian.net/browse/NETVIRT-1000#icft=NETVIRT-1000, are available in stable/carbon, then @Sai Sindhur Malleni can retest. (Full disclosure: There are few minor Tx leaks, showing up with much fewer transactions than the biggies, which we'll fix only on master, not on carbon & nitrogen.)
During perf/scale testing, we see OOM on ODL when a large number of neutron resources are created and deleted we see that ODL is killed due to OOM. Looking at the stdout when the JVM crashed due to OOM, we see:
Heap dump file created [3089813876 bytes in 19.232 secs]
Uncaught error from thread [opendaylight-cluster-data-shard-dispatcher-144] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data]
java.lang.OutOfMemoryError: Java heap space
at com.google.common.collect.RegularImmutableMap.createEntryArray(RegularImmutableMap.java:148)
at com.google.common.collect.RegularImmutableMap.<init>(RegularImmutableMap.java:81)
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:294)
at org.opendaylight.controller.cluster.datastore.persisted.FrontendHistoryMetadata.<init>(FrontendHistoryMetadata.java:40)
at org.opendaylight.controller.cluster.datastore.FrontendHistoryMetadataBuilder.build(FrontendHistoryMetadataBuilder.java:54)
at org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder$$Lambda$431/741495460.apply(Unknown Source)
at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)
at org.opendaylight.controller.cluster.datastore.persisted.FrontendClientMetadata.<init>(FrontendClientMetadata.java:38)
at org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder.build(FrontendClientMetadataBuilder.java:77)
at org.opendaylight.controller.cluster.datastore.FrontendMetadata$$Lambda$430/2026307982.apply(Unknown Source)
at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)
at org.opendaylight.controller.cluster.datastore.persisted.FrontendShardDataTreeSnapshotMetadata.<init>(FrontendShardDataTreeSnapshotMetadata.java:71)
at org.opendaylight.controller.cluster.datastore.FrontendMetadata.toSnapshot(FrontendMetadata.java:72)
at org.opendaylight.controller.cluster.datastore.FrontendMetadata.toSnapshot(FrontendMetadata.java:33)
at org.opendaylight.controller.cluster.datastore.ShardDataTree.takeStateSnapshot(ShardDataTree.java:216)
at org.opendaylight.controller.cluster.datastore.ShardSnapshotCohort.createSnapshot(ShardSnapshotCohort.java:68)
at org.opendaylight.controller.cluster.raft.RaftActorSnapshotMessageSupport.lambda$new$0(RaftActorSnapshotMessageSupport.java:52)
at org.opendaylight.controller.cluster.raft.RaftActorSnapshotMessageSupport$$Lambda$123/1533883683.accept(Unknown Source)
at org.opendaylight.controller.cluster.raft.SnapshotManager$Idle.capture(SnapshotManager.java:295)
at org.opendaylight.controller.cluster.raft.SnapshotManager$Idle.capture(SnapshotManager.java:307)
at org.opendaylight.controller.cluster.raft.SnapshotManager.capture(SnapshotManager.java:91)
at org.opendaylight.controller.cluster.raft.behaviors.Follower.lambda$handleAppendEntries$0(Follower.java:254)
at org.opendaylight.controller.cluster.raft.behaviors.Follower$$Lambda$127/742332312.apply(Unknown Source)
at org.opendaylight.controller.cluster.raft.ReplicatedLogImpl.lambda$appendAndPersist$0(ReplicatedLogImpl.java:111)
at org.opendaylight.controller.cluster.raft.ReplicatedLogImpl$$Lambda$128/559701765.apply(Unknown Source)
at akka.persistence.UntypedPersistentActor$$anonfun$persist$1.apply(PersistentActor.scala:206)