Refactor JournalSegmentDescriptor
Description
Activity

Ruslan Kashapov March 15, 2024 at 2:43 PMEdited
Analysis summary
JournalSegmentDescriptor represents a descriptor of a journal segment, serialized in a fixed length 64 byte array
allocated in a beginning of a data file. Contains following fields (ordered by allocation)
version (int) - expected to represent segment version but never used, value is always 1
id (long) - unique segment identifier, also part of filename
index (long) - represents the first entry index within current segment
maxSegmentSize (int) - inherited property of SegmentedJournal, only used to deliver property value to *SegmentWriter constructor
maxEntries (int) - inherited property of SegmentedJournal, never used
updated (long) - expected to represent the segment last update timestamp, in fact being artificialy set to an instance after being read from file, never used, never persisted
locked (boolean) - expected to represent that all entries in a segment are comitted, never used, value is always false
JournalSegmentDescriptor is managed by SegmentedJournal. New instance is created on new Segment creation, then the data is serialized into beginning of the file, then never modified.
Resuming: Only immutable id and index values are actually used, maxSegmentSize can be passed from SegmentedJournal to segment writer by passing property directly, no reason to persist it with segment data. Other properties are never used, so these are safe to remove
Suggested on JournalSegmentDescriptor refactoring:
convert to record
remove all properties except id and index
remove builder
convert constructor using ByteBuffer to static method fromBuffer(ByteBuffer) to static method reading id and index from known positions
convert copyTo(ByteBuffer) method to ByteBuffer asBuffer() method with buffer allocation within (simplifies caller logic)
maxSegmentSize to be processed directly as mentioned above
Also it makes sense to optimize SegmentedJournal.loadSegments() ->loadSegment(id) causing each file being opened twice to read same descriptor.
Details
Details
Assignee

Reporter

The layout of JournalSegmentDescriptor and its builder is weird.
The builder does not follow the Builder pattern at all assumes its methods will be called in the order in which the descriptor is to be layed out in the file – forcing users to specify withMaxEntries() even when that is not actually used.
Also UPDATED_LENGTH is not used, and SegmentDescriptor.update() checks 'locked' and is a no-op if locked is false.
All of this smells of rather huge trouble and lifecycle badness.
Analyze the ways the descriptor is actually used and bifurcate the mutable and non-mutable use cases.
The builder should cache set values, use some sane defaults or report ISE when something is not set and construct the ByteBuffer in proper order.
While we are at it, withMaxEntries() should be removed and a we should just write a (reasonably chosen) constant value.
Also, JournalSegmentDescritor.updated() seems to only reflect the last time the segment was open, running contrary to the JavaDoc of JournalSegmentDescriptor.updated(). At the end of the day, this information is largely unused. From semantics point of view, we certainly do not want to change this value every time we issue a write nor when we sync. This is largely a debug facility AFAICT, so perhaps it should be redefined as 'last time we cleanly closed this segment with modifications'? Needs further thought.