OFP RPC does not work from all instances in the cluster
Description
Environment
Attachments
- 06 Aug 2019, 11:59 AM
- 06 Aug 2019, 11:57 AM
- 06 Aug 2019, 11:49 AM
Activity
Luis Gomez Palacios August 13, 2019 at 7:55 PM
I think the issue is fixed now: https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-3node-clustering-only-sodium/
Luis Gomez Palacios August 12, 2019 at 12:57 PM
Hi Emmett,
FYI, I started a verification on this patch, considering this is the candidate fix:
Emmett Cox August 8, 2019 at 1:58 PM
Discovered the root of the issue is to do with the OpsRegistrar changes made as part of my commit.
Part of the changes removed functionality that removed and closed old rpc registrations, which caused the rpc's to not update correctly and fail when a node was shutdown.
I'm in the midst of making some code changes to fix the bug.
Emmett Cox August 6, 2019 at 11:37 AMEdited
discovered that I was missing the debug option for remote rpc logs, so there's a little bit more being logged now....
going to include the logs from all 3 nodes, give me a min to add them....
Emmett Cox August 1, 2019 at 12:35 PM
I wonder about the warning regarding the Connection refused for one of the nodes.... I've noticed the same warning appear for some of the other tests that succeed, but those tests take a few seconds longer to execute.... could it simply be akka timing out when it' would work if given a few extra seconds? not that it should take so long, but...
Regression was detected here:
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-clustering-only-sodium/
To reproduce just connect OVS switch to 3 controllers and file an RPC like this from all instances:
POST http://
controller
:8181/restconf/operations/sal-flow:add-flow<?xml version="1.0" encoding="UTF-8" standalone="no"?> <input xmlns="urn:opendaylight:flow:service"> <node xmlns:inv="urn:opendaylight:inventory">/inv:nodes/inv:node[inv:id="openflow:1"]</node> <table_id>0</table_id> <priority>2</priority> <match> <ethernet-match> <ethernet-type> <type>2048</type> </ethernet-type> </ethernet-match> <ipv4-destination>10.0.1.0/24</ipv4-destination> </match> <instructions> <instruction> <order>0</order> <apply-actions> <action> <output-action> <output-node-connector>1</output-node-connector> </output-action> <order>0</order> </action> </apply-actions> </instruction> </instructions> </input>
At least 1 instance will complain with this message:
<errors xmlns="urn:ietf:params:xml:ns:yang:ietf-restconf"> <error> <error-type>application</error-type> <error-tag>operation-failed</error-tag> <error-message>The operation encountered an unexpected error while executing.</error-message> <error-info>Ask timed out on [Actor[akka.tcp://opendaylight-cluster-data@10.18.130.162:2550/user/rpc/broker#-516941188]] after [15000 ms]. Message of type [org.opendaylight.controller.remote.rpc.messages.ExecuteRpc]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.</error-info> </error> </errors>