Fail to serialize oversized message

Description

Found in openflowplugin-csit-3node-clustering-bulkomatic-only-sulfur/190.

Serialization of messages larger than the maximum allowed (524288 bytes) fails and later causes circuit breaker to time out:
https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-bulkomatic-only-sulfur/190/odl_2/odl2_karaf.log.gz

2022-04-04T15:18:46,456 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-28 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:46,979 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-41 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:47,314 | INFO | ForkJoinPool-9-worker-3 | FlowReader | 287 - org.opendaylight.openflowplugin.applications.bulk-o-matic - 0.14.0 | Total Flows read: 10000 2022-04-04T15:18:47,500 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-42 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:48,023 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-40 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:48,233 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-10 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:48,539 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-10 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:49,063 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-6 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:49,579 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-11 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:50,099 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-10 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-04T15:18:50,257 | INFO | opendaylight-cluster-data-shard-dispatcher-38 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Initiating snapshot capture CaptureSnapshot [lastAppliedIndex=279998, lastAppliedTerm=5, lastIndex=279999, lastTerm=5, installSnapshotInitiated=, replicatedToAllIndex=-1, replicatedToAllTerm=-1, unAppliedEntries size=319, mandatoryTrim=false] 2022-04-04T15:18:50,259 | INFO | opendaylight-cluster-data-shard-dispatcher-33 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Persisting of snapshot done: Snapshot [lastIndex=279999, lastTerm=5, lastAppliedIndex=279998, lastAppliedTerm=5, unAppliedEntries size=319, state=org.opendaylight.controller.cluster.datastore.persisted.ShardSnapshotState@9b0c2c5, electionTerm=5, electionVotedFor=member-2-shard-inventory-config, ServerConfigPayload=null] 2022-04-04T15:18:50,259 | INFO | opendaylight-cluster-data-shard-dispatcher-33 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Removed in-memory snapshotted entries, adjusted snaphsotIndex: 279998 and term: 5 2022-04-04T15:18:50,311 | INFO | opendaylight-cluster-data-shard-dispatcher-27 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: SaveSnapshotSuccess received for snapshot, sequenceNr: 389120 2022-04-04T15:18:50,607 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config (Leader): Initiating install snapshot to follower member-1-shard-inventory-config: follower nextIndex: 240676, leader snapshotIndex: 279998, leader lastIndex: 281505, leader log size: 1507 2022-04-04T15:18:50,607 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Initiating snapshot capture CaptureSnapshot [lastAppliedIndex=281504, lastAppliedTerm=5, lastIndex=281505, lastTerm=5, installSnapshotInitiated=, replicatedToAllIndex=-1, replicatedToAllTerm=-1, unAppliedEntries size=1, mandatoryTrim=false] to install on member-1-shard-inventory-config 2022-04-04T15:18:50,643 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Persisting of snapshot done: Snapshot [lastIndex=281505, lastTerm=5, lastAppliedIndex=281504, lastAppliedTerm=5, unAppliedEntries size=1, state=org.opendaylight.controller.cluster.datastore.persisted.ShardSnapshotState@4add6906, electionTerm=5, electionVotedFor=member-2-shard-inventory-config, ServerConfigPayload=null] 2022-04-04T15:18:50,644 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Removed in-memory snapshotted entries, adjusted snaphsotIndex: 279998 and term: 5 2022-04-04T15:18:50,680 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: SaveSnapshotSuccess received for snapshot, sequenceNr: 392649 2022-04-04T15:18:50,824 | INFO | opendaylight-cluster-data-shard-dispatcher-29 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config (Leader): Snapshot successfully installed on follower member-1-shard-inventory-config (last chunk 2) - matchIndex set to 281504, nextIndex set to 281505 2022-04-04T15:19:00,327 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-40 | Shard | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to deleteMessages toSequenceNr [389120] for persistenceId [member-2-shard-inventory-config] due to [null: Circuit Breaker Timed out.]. 2022-04-04T15:19:00,329 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-28 | Shard | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to persist event type [org.opendaylight.controller.cluster.raft.persisted.ApplyJournalEntries] with sequence number [392650] for persistenceId [member-2-shard-inventory-config]. akka.pattern.CircuitBreaker$$anon$13: Circuit Breaker Timed out. 2022-04-04T15:19:00,335 | INFO | opendaylight-cluster-data-shard-dispatcher-36 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | Stopping Shard member-2-shard-inventory-config 2022-04-04T15:19:33,600 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-39 | ClientActorBehavior | 199 - org.opendaylight.controller.cds-access-client - 5.0.1 | member-2-frontend-datastore-config: connection ConnectedClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-config, generation=1}, cookie=1, backend=ShardBackendInfo{actor=Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-inventory-config#190559700], sessionId=2, version=SODIUM_SR1, maxMessages=1000, cookie=1, shard=inventory, dataTree=present}} reconnecting as ReconnectingClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-config, generation=1}, cookie=1, backend=ShardBackendInfo{actor=Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-inventory-config#190559700], sessionId=2, version=SODIUM_SR1, maxMessages=1000, cookie=1, shard=inventory, dataTree=present}} 2022-04-04T15:19:33,600 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-39 | ClientActorBehavior | 199 - org.opendaylight.controller.cds-access-client - 5.0.1 | member-2-frontend-datastore-config: refreshing backend for shard 1 2022-04-04T15:19:33,854 | INFO | ForkJoinPool-9-worker-3 | AbstractClientConnection | 199 - org.opendaylight.controller.cds-access-client - 5.0.1 | Capping ReconnectingClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-config, generation=1}, cookie=1, backend=ShardBackendInfo{actor=Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-inventory-config#190559700], sessionId=2, version=SODIUM_SR1, maxMessages=1000, cookie=1, shard=inventory, dataTree=present}} throttle delay from 47 to 5 seconds java.lang.Throwable: null at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:262) ~[?:?] at org.opendaylight.controller.cluster.access.client.SimpleReconnectForwarder.forwardEntry(SimpleReconnectForwarder.java:18) ~[?:?] at org.opendaylight.controller.cluster.access.client.TransmitQueue.enqueueOrForward(TransmitQueue.java:266) ~[?:?] at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.enqueueOrForward(AbstractClientConnection.java:183) ~[?:?] at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:257) ~[?:?] at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendRequest(AbstractClientConnection.java:159) ~[?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory.sendRequest(ProxyHistory.java:570) ~[?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.sendRequest(AbstractProxyTransaction.java:328) ~[?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.abort(AbstractProxyTransaction.java:446) ~[?:?] at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) [?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHandle.commonAbort(AbstractClientHandle.java:84) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHandle.abort(AbstractClientHandle.java:70) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.ClientBackedTransaction.close(ClientBackedTransaction.java:62) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.ClientBackedReadTransaction.close(ClientBackedReadTransaction.java:52) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.AbstractDOMBrokerTransaction.closeSubtransactions(AbstractDOMBrokerTransaction.java:87) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.DOMBrokerReadOnlyTransaction.close(DOMBrokerReadOnlyTransaction.java:47) [bundleFile:?] at org.opendaylight.mdsal.binding.dom.adapter.BindingDOMReadTransactionAdapter.close(BindingDOMReadTransactionAdapter.java:46) [bundleFile:?] at org.opendaylight.openflowplugin.applications.bulk.o.matic.FlowReader.readFlowsX(FlowReader.java:101) [bundleFile:?] at org.opendaylight.openflowplugin.applications.bulk.o.matic.FlowReader.run(FlowReader.java:67) [bundleFile:?] at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) [?:?] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?] 2022-04-04T15:19:38,617 | WARN | ForkJoinPool.commonPool-worker-3 | AbstractShardBackendResolver | 213 - org.opendaylight.controller.sal-distributed-datastore - 5.0.1 | Failed to resolve shard java.util.concurrent.TimeoutException: Connection attempt failed at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:151) ~[?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.onConnectResponse(AbstractShardBackendResolver.java:168) ~[?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$4(AbstractShardBackendResolver.java:161) ~[?:?] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?] at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) [?:?] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?] Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-2-shard-inventory-config#190559700)]] after [5000 ms]. Message of type [org.opendaylight.controller.cluster.access.commands.ConnectClientRequest]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply. 2022-04-04T15:19:38,857 | INFO | ForkJoinPool-9-worker-3 | AbstractClientConnection | 199 - org.opendaylight.controller.cds-access-client - 5.0.1 | Capping ReconnectingClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-config, generation=1}, cookie=1, backend=ShardBackendInfo{actor=Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-inventory-config#190559700], sessionId=2, version=SODIUM_SR1, maxMessages=1000, cookie=1, shard=inventory, dataTree=present}} throttle delay from 43 to 5 seconds java.lang.Throwable: null at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:262) ~[?:?] at org.opendaylight.controller.cluster.access.client.SimpleReconnectForwarder.forwardEntry(SimpleReconnectForwarder.java:18) ~[?:?] at org.opendaylight.controller.cluster.access.client.TransmitQueue.enqueueOrForward(TransmitQueue.java:266) ~[?:?] at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.enqueueOrForward(AbstractClientConnection.java:183) ~[?:?] at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:257) ~[?:?] at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendRequest(AbstractClientConnection.java:159) ~[?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory.sendRequest(ProxyHistory.java:570) ~[?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.sendRequest(AbstractProxyTransaction.java:328) ~[?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.abort(AbstractProxyTransaction.java:446) ~[?:?] at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) [?:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHandle.commonAbort(AbstractClientHandle.java:84) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHandle.abort(AbstractClientHandle.java:70) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.ClientBackedTransaction.close(ClientBackedTransaction.java:62) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.ClientBackedReadTransaction.close(ClientBackedReadTransaction.java:52) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.AbstractDOMBrokerTransaction.closeSubtransactions(AbstractDOMBrokerTransaction.java:87) [bundleFile:?] at org.opendaylight.controller.cluster.databroker.DOMBrokerReadOnlyTransaction.close(DOMBrokerReadOnlyTransaction.java:47) [bundleFile:?] at org.opendaylight.mdsal.binding.dom.adapter.BindingDOMReadTransactionAdapter.close(BindingDOMReadTransactionAdapter.java:46) [bundleFile:?] at org.opendaylight.openflowplugin.applications.bulk.o.matic.FlowReader.readFlowsX(FlowReader.java:101) [bundleFile:?] at org.opendaylight.openflowplugin.applications.bulk.o.matic.FlowReader.run(FlowReader.java:67) [bundleFile:?] at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) [?:?] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?]

Environment

None

Activity

Show:

Robert Varga May 13, 2022 at 12:24 PM

So investigating this a bit more, the size() here is meant to be an estimate of heap residency, not the persistent size – which does not have any expression at all.
I think this needs a thorough refactor to reign in the interfaces here, so that the two concerns can readily distinguished.

Robert Varga May 12, 2022 at 11:27 AM
Edited

This seems to be problem with accounting for the serialized size. The current tuning is that we have 512KiB for buffers and select up to 480KiB in payloads – i.e. leave out 32KiB for wrapping and other serialization overheads. Now that might not be enough as the format is generally this:

out.writeShort(appendEntries.leaderRaftVersion); out.writeLong(appendEntries.getTerm()); out.writeObject(appendEntries.leaderId); out.writeLong(appendEntries.prevLogTerm); out.writeLong(appendEntries.prevLogIndex); out.writeLong(appendEntries.leaderCommit); out.writeLong(appendEntries.replicatedToAllIndex); out.writeShort(appendEntries.payloadVersion); out.writeInt(appendEntries.entries.size()); for (ReplicatedLogEntry e: appendEntries.entries) { out.writeLong(e.getIndex()); out.writeLong(e.getTerm()); out.writeObject(e.getData()); } out.writeObject(appendEntries.leaderAddress);

Now the number of entries we select is dominated by

final int maxDataSize = context.getConfigParams().getSnapshotChunkSize();

and AbstractReplicatedLogImpl uses ReplicatedLogEntry.size() to count against that limit. That does not include the 16 bytes spent on each entry for term/index, nor does it include the encapsulation overhead of writeObject() – hence we can bite off a larger number of entries than we can actually serialize into the 512KiB buffer.

The second part is that Payload.size() does not say anything about its semantics. Looking at implementations, there is actually a major disconnect between them:

  • ServerConfigurationPayload performs mock serialization and reports accurate serialized size

  • NoopPayload says '0' and uses a singleton Proxy, so the size does not reflect actual serialized size

  • AbstractIdentifiablePayload reports the size of serialized byte[], without related proxy overhead (just as NoopPayload)

  • CommitTransactionPayload does the same thing – ignoring size of the proxy overhead and also discounting the 4 bytes used to hold the length

Robert Varga May 11, 2022 at 6:22 PM

The circuit breaker seems to be an environment problem, as tracked in https://lf-opendaylight.atlassian.net/browse/CONTROLLER-2043#icft=CONTROLLER-2043

Robert Varga May 11, 2022 at 6:19 PM

So this is just AppendEntries, other messages seem to be okay.

Sangwook Ha April 27, 2022 at 7:26 AM

This is still found with v5.0.3:

https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-bulkomatic-only-sulfur/222/odl_1/odl1_karaf.log.gz

2022-04-27T05:54:07,395 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-19 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.2 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.170.139:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-27T05:54:07,411 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-19 | RemoteActorRef | 206 - org.opendaylight.controller.repackaged-akka - 5.0.2 | Message [org.opendaylight.controller.cluster.raft.messages.AppendEntries] wrapped in [akka.actor.ActorSelectionMessage] from Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-inventory-config#878229052] to Actor[akka://opendaylight-cluster-data@10.30.170.139:2550/] was dropped. Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.170.139:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].. [9] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
Done

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Created April 5, 2022 at 10:13 PM
Updated May 26, 2022 at 10:57 PM
Resolved May 26, 2022 at 10:57 PM