Follower can retain more than 1 copy of the same Snapshot in memory leading to OOM
Activity

Robert Varga November 6, 2024 at 3:37 PMEdited
We also have two bits in InstallSnapshot available, so … we kinda can piggy-back an alternative ‘stiill applying’ protocol on top of that.
I have just noticed the lifecycle on follower could be improved quite a bit: rather than pulling in the contents of the snapshot into memory before sending out ApplySnapshot to ourselves, we could just push the SnapshotTracker state, so that the data is still on disk and pull it when we go for ApplyState… that way it does not end up lurking in memory. But then again, things could get skewed in a worse way…. so that’s something we will do separately in https://lf-opendaylight.atlassian.net/browse/CONTROLLER-2135.

Robert Varga November 4, 2024 at 10:32 AM
So the good news is that since https://lf-opendaylight.atlassian.net/browse/CONTROLLER-2058, we have 3 bits unused in the serialization format (raft.messages.IR), which are sent as 0 and ignored on receive – hence we can easily indicate ‘received, not processed’ in a backwards-compatible manner.
Old serialization format, this would require updating receivers – there is a single byte for ‘success’ boolean, but anything non-zero is interpreted as ‘true’ – which would mean old clients would mis-interpret things.

Robert Varga November 3, 2024 at 11:23 PMEdited
So there are a couple of issues here.
We really should have an InstallSnapshotReply, which indicates the chunk has been received, but it has not been installed yet – so that leader does not re-transmit the last chunk but keeps soliciting installation. Not sure we can do that in InstallSnapshotReply without breaking compatibility.
Follower should also remember it has received the last chunk, has released all the tracker state, but it knows it still needs to send a reply. Currently that state is in the system, but not reachable by Follower class: it is the
reply
that is being sent afterapplySnapshotCallback
completes.
So we need to do 2) first: create an explicit class to implement ApplySnapshot.Callback
, holding the response and some metadata (leader/term/index). That should be stored in Follower as normal state and it should be checked when we receive InstallSnapshot: if that ends up being the same install snapshot, then we should just ignore (or reply with the thing in 1)) and wait for the state to be applied. That way, without changing the protocol, Follower will stop doing the vicious loop, but leader will still be re-transmitting the first chunk – but with no ill effects on the follower.
Then we can look into how Leader can know not to re-transmit the last chunk, i.e. 1).
Details
Assignee
Robert VargaRobert VargaReporter
Tibor KrálTibor KrálLabels
Components
Priority
Highest
Details
Details
Assignee

Reporter

There is a bug in the current InstallSnapshot process caused by the delayed reply for the last chunk. The Follower sends InstallSnapshotReply for the last chunk only after the Snapshot gets applied and persisted. The larger the snapshot, the longer it may take. And here the Leader can resend the Snapshot again. Here's the flow leading to the issue:
Follower receives the last chunk of the Snapshot.
Follower wraps the Snapshot in ApplyMessage and sends it to RaftActorSnapshotMessageSupport to be applied and persisted. Number of Snapshots in Follower's memory = 1
Follower closes SnapshotTracker
During this time the Leader awaits the Reply for the last chunk. When the chunk-timeout runs out, the Leader re-sends the last chunk again.
The Follower receives the last chunk again.
Follower creates new SnapshotTracker (previous one was closed)
Follower tries to add last chunk to the SnapshotTracker which ends in InvalidChunkException("expected chunk 1, adding chunk n")
Follower closes SnapshotTracker and sends InstallSnapshotReply(chunkIndex = -1, success =false)
Leader receives InstallSnapshotReply(chunkIndex=-1) and resets LeaderInstallSnapshotState
Leader starts sending chunks of the same Snapshot again from chunk 1
Follower starts collecting chunks
Follower receives the last chunk of the Snapshot
Follower wraps the Snapshot in ApplyMessage and sends it to RaftActorSnapshotMessageSupport to be applied and persisted. Number of Snapshots in Follower's memory = 2
If the previous Snapshot is still being applied/persisted, this ApplySnapshot message is simply dropped.
After chunk-timeout the Leader sends the last chunk again and starts the whole cycle over again.
And here is the issue - if for example:
the GC takes longer to clear the large Snapshot object
or the ActorSystem gets overwhelmed and the ApplySnapshot message is hanging in RaftActorSnapshotMessageSupport 's mailbox
We can have the Leader start sending the third copy of the same Snapshot and Follower will attempt to collect it as he does.
Possible solution is to make the Follower reject the InstallSnapshot message from Leader, if there's a Snapshot being applied at the same time. If the Follower just drops the InstallSnapshot message and doesn't reply, the Leader will just re-send the last chunk again and again after the next chunk-timeout elapses. And when the Follower finally finishes the ApplySnapshot, he'll send the InstallSnapshotReply for the last chunk and the Leader will simply close the LeaderInstallSnapshotState as if there was no extra delay.