Improve DataJournal.handleWriteMessages()

Description

The performance benchmark we have introduced in shows that writes are dominated by JournalSegmentWriter.flush(), which forces entries to be durable in persistent storage.

In our current implementation this flush() call is invoked from two places:

internally in atomix-storage, when a segment file reaches the end
in DataJournal.handleWriteMessages() just before we return, but after the message is acknoledged

The first case is inevitable, as we really want to be sure a particular segment is synced before we move on.

The second case is interesting, as sal-distributed-datastore uses [persistAsync()|https://doc.akka.io/api/akka/2.7/akka/persistence/PersistentActor.html#persistAsync[A](event:A)(handler:A=%3EUnit):Unit] and expects the write to be durable, e.g. having been flush()ed when completion is signalled.

In sal-akka-segmented-journal terms this means that DataJournalV0 does the sub-optimal/incorrect thing when it does:

Note how we invoke message.setSuccess() before we issue writer.flush(). Not exactly to the expectation of the write being durable.

I think we want to have another stage of processing here, which takes care of batching request synchronization – i.e. message.setSuccess() should really be called only after writer.flush().

We need an explicit unit test which, unlike PerformanceTest, will use asynchronous actor execution, where multiple write requests can be issued and processed in the background. I think that benchmark will show that given async storage requests, we end up with a number of WriteMessages being outstanding, but a separate sync occuring happening for every one of them – i.e. that even in Shard submits a ton of async persistence requests which can be batched, we sync every such request separately.

If that really is the case, we want to have a separate 'flush thread', which has a reference to the writer and an incoming queue and essetiallly runs the following pseudoloop:

The overall point being: if sal-distributed-datastore's Shard actor submits persistAsync() faster than writer.flush() can handle them one by one, we should end up with batching multiple persistAsync() calls being finished by a single writer.flush().

All of this is a conjecture and needs delving into how Akka persistence batches AtomicWrites, if at all and how the system behaves.

Activity

Show:

Robert Varga March 26, 2024 at 2:12 PM

So segmented journal has a new tuneable, max-unflushed-bytes, which specifies how much the flush can be delayed.
New deployments will default to 1MiB through factory-akka.conf, old deployments will continue to use immediate flushing.