Table Miss Entry failed to program in 3 node netvirt CSIT
Description
Environment
Activity
Arunprakash D August 7, 2018 at 5:15 AM
@Faseela K, thanks for the link via email.
We are able to check the sequence of events from openflowplugin to FRM and the owner is being set now before any default flows being programmed by applications.
@Somashekhar Javalagi, you can work on formalizing the review.
Arunprakash D August 7, 2018 at 5:01 AM
@Faseela K, could you please provide the latest CSIT run logs, we want to verify the logs for the sequence of events.
Arunprakash D August 6, 2018 at 12:55 PM
@Sam Hague, we are working on a simplified fix which can be used for the current situation and move to a better solution later in neon.
@Somashekhar Javalagi will raise a review tomorrow with the probable fix and @Faseela K can run CSIT and confirm if it works.
Sam Hague August 6, 2018 at 12:43 PM
Are there any current workarounds? Understood that this is difficult but this is looking like a very bad issue. Seems like you can hit this at any time. Restarting isn't a good workaround though.
Somashekhar Javalagi August 6, 2018 at 6:11 AM
Openflowplugin-impl is deciding the mastership of the device and communicating same to the forwardingrules-manager through yang notification. But this yang notification is taking more time(around 15 to 25 msecond) to reach FRM, after which mastership will be set to the device. By the time yang notification reaches FRM, there will be flows start coming for the reconciliation. As the device is not yet mastered, these flows get rejected.
There is no direct way to fix this, as the fix may involve lot of design changes.
L2 suites in netvirt are failing randomly in 3 node, and in one such failed instance I noticed that Table Miss entry for Table 43 is not programmed on switch.
I can see the flow being present in config/opendaylight-inventory, but flow is missing on the switch.
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/229/faseela-l2-netvirt-csit-3node-openstack-queens-upstream-stateful-fluorine/15/
Problematic node id - openflow:80006780316000
Model Dumps can be found at the Tear Down Dumps of the first failing TC "Check If VMs got IP Address". Let me know if any more information is needed.