Table Miss Entry failed to program in 3 node netvirt CSIT

Description

L2 suites in netvirt are failing randomly in 3 node, and in one such failed instance I noticed that Table Miss entry for Table 43 is not programmed on switch.

I can see the flow being present in config/opendaylight-inventory, but flow is missing on the switch.

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/229/faseela-l2-netvirt-csit-3node-openstack-queens-upstream-stateful-fluorine/15/

Problematic node id - openflow:80006780316000

Model Dumps can be found at the Tear Down Dumps of the first failing TC "Check If VMs got IP Address". Let me know if any more information is needed.

Environment

None

Activity

Show:

Arunprakash D August 7, 2018 at 5:15 AM

, thanks for the link via email.

https://jenkins.opendaylight.org/sandbox/job/faseela-l2-netvirt-csit-3node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-oxygen/

 

We are able to check the sequence of events from openflowplugin to FRM and the owner is being set now before any default flows being programmed by applications.

, you can work on formalizing the review.

Arunprakash D August 7, 2018 at 5:01 AM

, could you please provide the latest CSIT run logs, we want to verify the logs for the sequence of events.

https://git.opendaylight.org/gerrit/74715

Arunprakash D August 6, 2018 at 12:55 PM

, we are working on a  simplified fix which can be used for the current situation and move to a better solution later in neon.

will raise a review tomorrow with the probable fix and can run CSIT and confirm if it works.

Sam Hague August 6, 2018 at 12:43 PM

Are there any current workarounds? Understood that this is difficult but this is looking like a very bad issue. Seems like you can hit this at any time. Restarting isn't a good workaround though.

Somashekhar Javalagi August 6, 2018 at 6:11 AM

Openflowplugin-impl is deciding the mastership of the device and communicating same to the forwardingrules-manager through yang notification. But this yang notification is taking more time(around 15 to 25 msecond) to reach FRM, after which mastership will be set to the device. By the time yang notification reaches FRM, there will be flows start coming for the reconciliation. As the device is not yet mastered, these flows get rejected.

 

There is no direct way to fix this, as the fix may involve lot of design changes.

Done

Details

Assignee

Reporter

Labels

Components

Fix versions

Priority

Created July 17, 2018 at 6:43 AM
Updated February 6, 2025 at 2:17 PM
Resolved August 20, 2018 at 4:57 AM