Refactor JournalSegmentDescriptor

Description

The layout of JournalSegmentDescriptor and its builder is weird.

The builder does not follow the Builder pattern at all assumes its methods will be called in the order in which the descriptor is to be layed out in the file – forcing users to specify withMaxEntries() even when that is not actually used.

Also UPDATED_LENGTH is not used, and SegmentDescriptor.update() checks 'locked' and is a no-op if locked is false.

All of this smells of rather huge trouble and lifecycle badness.

Analyze the ways the descriptor is actually used and bifurcate the mutable and non-mutable use cases.

The builder should cache set values, use some sane defaults or report ISE when something is not set and construct the ByteBuffer in proper order.

While we are at it, withMaxEntries() should be removed and a we should just write a (reasonably chosen) constant value.

Also, JournalSegmentDescritor.updated() seems to only reflect the last time the segment was open, running contrary to the JavaDoc of JournalSegmentDescriptor.updated(). At the end of the day, this information is largely unused. From semantics point of view, we certainly do not want to change this value every time we issue a write nor when we sync. This is largely a debug facility AFAICT, so perhaps it should be redefined as 'last time we cleanly closed this segment with modifications'? Needs further thought.

Activity

Show:

Ruslan Kashapov March 15, 2024 at 2:43 PM
Edited

Analysis summary

JournalSegmentDescriptor represents a descriptor of a journal segment, serialized in a fixed length 64 byte array
allocated in a beginning of a data file. Contains following fields (ordered by allocation)

  • version (int) - expected to represent segment version but never used, value is always 1

  • id (long) - unique segment identifier, also part of filename

  • index (long) - represents the first entry index within current segment

  • maxSegmentSize (int) - inherited property of SegmentedJournal, only used to deliver property value to *SegmentWriter constructor

  • maxEntries (int) - inherited property of SegmentedJournal, never used

  • updated (long) - expected to represent the segment last update timestamp, in fact being artificialy set to an instance after being read from file, never used, never persisted

  • locked (boolean) - expected to represent that all entries in a segment are comitted, never used, value is always false

JournalSegmentDescriptor is managed by SegmentedJournal. New instance is created on new Segment creation, then the data is serialized into beginning of the file, then never modified.

Resuming: Only immutable id and index values are actually used, maxSegmentSize can be passed from SegmentedJournal to segment writer by passing property directly, no reason to persist it with segment data. Other properties are never used, so these are safe to remove

Suggested on JournalSegmentDescriptor refactoring:

  • convert to record

  • remove all properties except id and index

  • remove builder

  • convert constructor using ByteBuffer to static method fromBuffer(ByteBuffer) to static method reading id and index from known positions

  • convert copyTo(ByteBuffer) method to ByteBuffer asBuffer() method with buffer allocation within (simplifies caller logic)

  • maxSegmentSize to be processed directly as mentioned above

Also it makes sense to optimize SegmentedJournal.loadSegments() ->loadSegment(id) causing each file being opened twice to read same descriptor. 

Done

Details

Assignee

Reporter

Labels

Components

Fix versions

Priority

Created March 12, 2024 at 10:12 PM
Updated May 5, 2024 at 10:12 PM
Resolved May 5, 2024 at 10:12 PM