Circuit breaker timeout with BGP and tell-based protocol
Description
Environment
Attachments
blocks
relates to
split to
Activity
Robert Varga March 11, 2024 at 4:54 PM
For controller-6.0.x and controller-5.0.x the workaround is to configure "memory-mapped" to false.
Robert Varga March 10, 2024 at 7:31 PM
For comparison, the benchmark run with , and this issue resolved shows this:
With DISK, worst 99.9% percentile is 262ms and the worst single write is 353ms. While absolute numbers are worse, standard deviation is improved across the board – and preparation times seem to indicate jitter in the environment.
With MAPPED, there are no more major outliers. Worst 99.9% percentile is 92.78ms and the worst single write is 162.5ms. Standard deviation is in the ballpark the mean in all cases.
Robert Varga March 10, 2024 at 7:11 PMEdited
The benchmark run establishes the following baseline:
With DISK, the times are measured in milliseconds. Worst 99.9% percentile is 61.70ms and the worst single write is 198.9ms. Standard deviation is below, sometimes very much below the mean.
With MAPPED, the times are similar, but there are major outliers. Worst 99.9% percentile is 146.9mds and the worst single write is 6.8 seconds. Standard deviation fluctuates, being above the mean in three out of four cases.
Robert Varga March 9, 2024 at 9:23 AM
removes a user of FileChannel global state.
Robert Varga March 9, 2024 at 8:33 AM
So the problem caused by handoff between FileChannelJournalSegmentWriter and MappedJournalSegmentWriter here.
While it would seem this is a simple hand-off, it is actually a heavy-weight operation due to both constructors calling reset(0). Here 0 is a magic value, causing traversal of all valid entries, which are then put into the index.
So this is a lifecycle problem: we are using the same codepath to initially open the file (when we need walk the entries and store them in the index) and when we are just flipping implementations.
This needs to be fixed by eliminating the reset(0) call out of the constructor and calling it only from MappableJournalSegmentWriter constructor – i.e. when we first load the segment.
We are hitting this problem:
in BGP test suite:
https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-bgp-ingest-all-sulfur/216/odl_1/odl1_karaf.log.gz
Previous versions work out fine, except they end up saving snapshot only at the 160K mark due to using ask-based protocol, which uses fewer persistence entries.