No status for rogue device in callhome server API
Description
Environment
Activity
@Manoj Chokka, per the conversation we had in the kernel call today is the "fix" for this issue
going to all happen on the test side of things? If so, we can move this ticket to that project.
let me know and I can take care of that.
Hi Luis,
Found a better fix for the testcase.
Looks like the entries in sources.list in the docker are working a few times and failing the other times.
Found the actual sources list at https://wiki.debian.org/LTS/Using
deb http://deb.debian.org/debian/ jessie main contrib non-free
deb-src http://deb.debian.org/debian/ jessie main contrib non-free
deb http://security.debian.org/ jessie/updates main contrib non-free
deb-src http://security.debian.org/ jessie/updates main contrib non-free
To reflect these changes,
create sources.list file in ~/integration-test/csit/variables/netconf/callhome
add the above sources to the file.
add a volume in docker-compose.yaml file,
- /home/chokkma/integration-test/csit/variables/netconf/callhome/sources.list:/etc/apt/sources.list
the above volume command will copy the new source.list to the /etc/apt/sources.list
And we also need to change the timeout in to 300s in callhome.robot, to allow 'apt-get update' and 'apt-get install curl' to finish successfully.
If you approve these changes, I can make the changes and push them.
Please let me know.
BR
Manoj
I will fix this in the CSIT.
Update on the analysis done so far,
The code does not have any issues.
During manual testing we were not able to reproduce the issue. We were able to see the message FAILED_NOT_ALLOWED in the device status for rouge devices.
Upon examination of the callhome.robot test case, we found the "apt-get update" is taking more time than expected. But the wait time for the test case is only 30s. This is the reason the test case is failing.
A simple timeout change would do the trick.
I've just quickly ran trough the implementation and it seems that the device should be reported as FAILED_NOT_ALLOWED.
Need to investigate it further why it is not reported that way.
When a rogue device (it is not added in whitelist) tries to connect to controller (callhome server), it fails to connect (OK) but the device status "FAILED_NOT_ALLOWED" does not show in the callhome server API (NOK):
GET /restconf/operational/odl-netconf-callhome-server:netconf-callhome-server 404 {"errors":{"error":[{"error-type":"application","error-tag":"data-missing","error-message":"Request could not be completed because the relevant data model content does not exist "}]}}
In the karaf log it can be seen:
2018-10-09T09:34:13,471 | INFO | remote-connector-processing-executor-11 | NetconfDevice | 287 - org.opendaylight.netconf.sal-netconf-connector - 1.8.1.SNAPSHOT | RemoteDevice{netopeer}: Netconf connector initialized successfully 2018-10-09T09:34:13,474 | WARN | opendaylight-cluster-data-notification-dispatcher-48 | CallhomeStatusReporter | 268 - org.opendaylight.netconf.callhome-provider - 1.5.1.SNAPSHOT | No corresponding callhome device found - exiting.
So maybe this is expected behavior.
In any case the failing test has been commented in this patch:
https://git.opendaylight.org/gerrit/#/c/76807