We have accumulated quite a few deviations from the base RAFT protocol and have quite a bit of additional semantics build on top of the messages.
As an example see InstallSnapshot, it has number of chunks and current chunk – but it does not enforce any relationship between the two. ServerConfig can be included or not. So is hashCode.
We really should have InstallSnapshot as an abstract base class (to tie it to RAFT) and have 4 separate subclasses to indicate:
a single-message install: does not have chunks, implies done=true, has hashCode, has optional serverconfig
multi-message install start: indicates the number of chunks, implies done=false, has optional serverconfig
multi-message middle: has chunk number, implies done=false
multi-message end: has a chunk number == number of chunks, implies done=true
As noted in , we probably need a ‘have data, applying’ follower state indicated – which is a modification of InstallSnapshotReply.
This is a rough outline, though: InstallSnapshot deviates from the specification in that it implies we know the number of chunks, i.e. that the message has already been sliced up. The spec carries a byte offset instead (with 0 indicating the first message). This was presumably done to improve message size – but that is very moot, as WritableObjects.writeLong() is able to give us a long offset and 4 bits of storage – so we can carry an explicit offset and a done flag in 2-9 bytes, compared to fixed 8 bytes we use now.
Those 4 bits could then be used to indicate presence of server config in the first message – and we should make ServerConfig a WritableObject (or hand-serialize) for good measure, so we improve efficiency even more.
This tasks needs to be further analyzed, essentially starting from a blank slate, to design messages which cover exactly what we need.
We have accumulated quite a few deviations from the base RAFT protocol and have quite a bit of additional semantics build on top of the messages.
As an example see InstallSnapshot, it has number of chunks and current chunk – but it does not enforce any relationship between the two. ServerConfig can be included or not. So is hashCode.
We really should have InstallSnapshot as an abstract base class (to tie it to RAFT) and have 4 separate subclasses to indicate:
a single-message install: does not have chunks, implies done=true, has hashCode, has optional serverconfig
multi-message install start: indicates the number of chunks, implies done=false, has optional serverconfig
multi-message middle: has chunk number, implies done=false
multi-message end: has a chunk number == number of chunks, implies done=true
As noted in , we probably need a ‘have data, applying’ follower state indicated – which is a modification of InstallSnapshotReply.
This is a rough outline, though: InstallSnapshot deviates from the specification in that it implies we know the number of chunks, i.e. that the message has already been sliced up. The spec carries a byte offset instead (with 0 indicating the first message). This was presumably done to improve message size – but that is very moot, as WritableObjects.writeLong() is able to give us a long offset and 4 bits of storage – so we can carry an explicit offset and a done flag in 2-9 bytes, compared to fixed 8 bytes we use now.
Those 4 bits could then be used to indicate presence of server config in the first message – and we should make ServerConfig a WritableObject (or hand-serialize) for good measure, so we improve efficiency even more.
This tasks needs to be further analyzed, essentially starting from a blank slate, to design messages which cover exactly what we need.