Reconciliation framework failure when starting cbench tool for the first time
Description
Environment
Attachments
blocks
Activity
JamO Luhrsen February 27, 2020 at 6:10 PM
actually, now that I think about it, I think I was able to reproduce this locally with sodium. Just didn't spend much time double checking. Maybe upstream CSIT is slightly slow enough in sodium + that shared cloud environment compared to my local laptop that the race condition isn't hit.

Gobinath Suganthan February 27, 2020 at 4:50 PM
Yes. This has to be cherry-picked to Sodium too. Cherry-picked here https://git.opendaylight.org/gerrit/c/openflowplugin/+/87965
It is surprising that the cbench is passing in Sodium however.
JamO Luhrsen February 27, 2020 at 4:39 PM
Great debugging and thanks for the quick fix, . the patch test job is passing now and I have merged the stable/magnesium patch.
does this need to be cherry-picked to sodium?
I assume it should be in master, so I already made that cherry pick

Gobinath Suganthan February 27, 2020 at 2:45 PM
I had added some logging and rerun the tests. From the logs, it was found that the "flownodereconciliation" service was throwing an uncaught exception (ConcurrentModificationException) which had resulted in the reconciliation framework.
Logs:
2020-02-27T11:48:31,071 | ERROR | ofppool-2 | ContextChainHolderImpl | 310 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Reconciliation framework failure for device openflow:1 with error
java.util.ConcurrentModificationException: null
at java.util.HashMap.computeIfAbsent(HashMap.java:1134) ~[?:?]
at org.opendaylight.openflowplugin.applications.frm.impl.FlowNodeReconciliationImpl.startReconciliation(FlowNodeReconciliationImpl.java:302) ~[?:?]
at org.opendaylight.openflowplugin.applications.reconciliation.impl.ReconciliationManagerImpl.lambda$reconcileServices$4(ReconciliationManagerImpl.java:135) ~[?:?]
I have fixed this and added some logs to debug similar issues.
https://git.opendaylight.org/gerrit/c/openflowplugin/+/88084
Note:
The "flownodereconciliation" task completes fast when the switch connects initially (no flows are present in inventory config for the node). So the race condition would be more probable in this case and we see might be seeing the cbench failures only during initial connection.

Gobinath Suganthan February 27, 2020 at 8:24 AM
I have raised a review & triggered the job but the job seem to be waiting with no available executor for too long.
https://git.opendaylight.org/gerrit/c/openflowplugin/+/88084
Job link:
https://jenkins.opendaylight.org/releng/job/openflowplugin-patch-test-cbench-magnesium/1/
Alternately could you please send the steps to set up cbench in local. (couldn't find complete working steps anywhere)
this issue was noticed because an old CSIT job using cbench had a consistent failure that is not
seen in sodium (technically a regression in that sense)
The cbench tool fails on starting with a "closed connection ... exiting" message.
snippet from this karaf.log :
Only the first try fails though, and additional cbench tests are ok.
We should figure out the root cause of the failure before we talk about the few workarounds
I know we can do:
start a dummy cbench test first to get past the initial failure
remove the cbench tests altogether if they no longer provide any value, although I do check from time to time the graphs that those jobs produce