NETCONF:Testing
This page contains information about NETCONF (Scale, Performance etc.) testing.
Scale tests
Scale tests for NETCONF in ODL.
NETCONF southbound scale test
Goal of this test is to measure how many NETCONF devices can be mounted by ODL with a set amount of RAM.
Scenario
Start netconf-testtool that starts the desired amount of netconf servers
Testtool generates initial configuration for odl
ODL tries to connect to all of the simulated devices.
Measure the amount of devices connected(if all weren't connected)
Measure the time until odl connected to all the devices with a certain amount of RAM(2,4,8,16 GB)
How to
Make sure the open file limit is set reasonably high to the amount of devices started: https://wiki.opendaylight.org/view/OpenDaylight_Controller:Netconf:Testtool#Too_many_files_open
Unpackage a clean odl distribution, our scale utility will take care of feature installation and config generation
Download netconf scale-util : https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/netconf/netconf-testtool/1.1.0-SNAPSHOT/netconf-testtool-1.1.0-20160308.161039-64-scale-util.jar
Run the scale tool :
java -Xmx8G -jar scale-util-1.1.0-SNAPSHOT-scale-util.jar --distribution-folder ./distribution-karaf-0.4.0-Beryllium --device-count 8000 --ssh false --exi false --generate-configs-batch-size 1000
The scale util needs to be pointed to an unpacked distribution (--distribution-folder argument) and handles the karaf start and feature installation. While the test is running the utility is also checking Restconf periodically to see the current status of all netconf devices. After the test completes successfully karaf is stopped, features are cleaned and the whole test is restarted with more devices(currently hardcoded 1000, parameter for this needs to be added). If you are starting with more ram than 2GB you should start with more devices than 8k.right away. The test results are being logged periodically into the scale-results.log that's present in the same location as the util jar. If you are running with even more devices the ram for testtools should be increased aswell.
To run the test with tcp add --ssh false argument when starting the scale-util.
To rerun the test with more ram available to odl you need to edit the /${distribution-location}/bin/setenv script line:
export JAVA_MAX_MEM="4G"
NOTE: the fastest way to find out how many devices odl can handle at a given ram is to start the test with a larger amount of devices than it can handle, set the config batch size to 1k and start the test. You can then analyze the result log to see where there seems to be a drop off in the connected devices/or the test timed out.
Results
Beryllium
Environment:
OS: Ubuntu Linux 4.2.0-30-generic x86_64
CPU: Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz, 40 cores
RAM: 128GB
Network: Single VM for both ODL and simulated NETCONF device
JVM: Oracle 1.8.0_73
Other configuration:
No EXI
In this test the simulated devices testtool started only had the base capabilites that testtool has included by default(ietf-netconf-monitoring, ietf-inet-types, ietf-yang-types).
Netconf scale test | |||
|---|---|---|---|
Ram for ODL - 2GB | |||
transport | devices | time needed |
|
ssh | 8000 | 3m 40s | 4k batches, starts having low memory(most of the time in GC) issues around 8k devices. |
ssh | 9000 | 12m 16s | 1k batches, times out after 20minutes |
tcp | 20000 | 6m 03s | 4k batches, reached 20min timeout, maybe can handle a bit more |
tcp | 21000 | 18m 54s | 1k batches, reached 20min timeout, maybe can handle a bit more |
Ram for ODL - 4GB | |||
ssh | 14000 | 9m 28s |
|
ssh | 15000 | 17m 20s | timeout after 20minutes, hits the ram limit |
tcp | 24000 | 18m 31s | 1k batches |
tcp | 28000 | 17m 27s | 2k batches, timeout after 20min, should be able to get higher but needs more time |
With tcp we also noticed that after 15k devices theres a pretty big slowdown with pushing/handling the configs as there starts to be increasing gaps between the individual batches.
Beryllium SR3
Environment:
OS: Fedora 23 Linux 4.2.3-300.fc23.x86_64
CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz, 88 cores
RAM: 64GB
Network: Both ODL and simulated NETCONF device were on the same test system
JVM: Oracle 1.8.0_101
Other configurations
No EXI
Howto
Make sure the open file limit is set reasonably high to the amount of devices started: https://wiki.opendaylight.org/view/OpenDaylight_Controller:Netconf:Testtool#Too_many_files_open
Unpackage a clean odl distribution, taken from https://nexus.opendaylight.org/content/repositories/public/org/opendaylight/integration/distribution-karaf/0.4.3-Beryllium-SR3/distribution-karaf-0.4.3-Beryllium-SR3.zip
Download netconf scale-util, place it in the current operating directory:
Git clone https://github.com/opendaylight/netconf.git
cd netconf
mvn clean install
cp ~/netconf/netconf/tools/netconf-testtool/target/scale-util-1.2.0-SNAPSHOT.jar ~
Run the scale tool :
java -Xmx8G -jar scale-util-1.2.0-SNAPSHOT-scale-util.jar --distribution-folder ./distribution-karaf-0.4.0-Beryllium --device-count 8000 --ssh false --exi false --generate-configs-batch-size 1000
Results:
The Netconf Southbound test would repeatedly fail with a "java.lang.OutOfMemoryError: unable to create a new native thread" error message. The tester used the command set defined above, although the flag -Xmx8G was decreased to 4G, increased to 16G, and 32G -- all with the same result. At the same time this tester increased the value of JAVA_MAX_MEM to 16G and 32G, and still encountered the java.lang.OutOfMemoryError message. Log files of the failure and hs_err_pid log files captured and saved for examination. These are available on request.
Performance tests
Performance tests for NETCONF in ODL.
NETCONF northbound performance test
Goal of this test is to measure is the performance of an external NETCONF client uploading information into ODL (Global Datastore) using just NETCONF northbound server.
Scenario
ODL controller starts with simple l2fib models and NETCONF northbound for MD-SAL enabled
External fast netconf client writes lots of l2fib entries into MD-SAL's global DataStore using NETCONF northbound interface
The client measures time since sending out the 1st request until last response is received
After all the l2fibs are in ODL, performance is calculated in terms of requests and l2fibs written per scond
How to
This how to will be split into multiple ordered sections:
ODL and tooling setup
Build or download beryllium based ncmount distribution of ODL(the distro has MD-SAL NETCONF northbound enabled and contains l2fib models)
Download from: //TODO-add-link-to-prebuilt-distro ncmount l2fib distro, unzip and "cd ncmount-karaf-1.1.0-SNAPSHOT/"
Or build by:
git clone https://git.opendaylight.org/gerrit/coretutorials cd coretutorials git checkout stable/beryllium git fetch https://git.opendaylight.org/gerrit/coretutorials refs/changes/16/35916/1 && git checkout FETCH_HEAD cd ncmount mvn clean install -DskipTests -Dcheckstyle.skip cd karaf/target/assembly/
Download Beryllium version of NETCONF stress client tool and untar
Download the rest of testing resources and unzip next to the client
Start ODL distribution:
./bin/karaf
Wait until you see in logs:
NETCONF Node: controller-config is fully connected
By default, only the SSH endpoint (port 2830) is opened for MD-SAL, but TCP(will also be tested, port 2831) can be easily enabled by performing REST call:
curl -u "admin:admin" -H "Accept: application/xml" -H "Content-type: application/xml" --request POST 'http://localhost:8181/restconf/config/network-topology:network-topology/topology/topology-netconf/node/controller-config/yang-ext:mount/config:modules' --data '<module xmlns="urn:opendaylight:params:xml:ns:yang:controller:config"> \ <type xmlns:prefix="urn:opendaylight:params:xml:ns:yang:controller:netconf:northbound:tcp">prefix:netconf-northbound-tcp</type> \ <name>netconf-mdsal-tcp-server</name> \ <dispatcher xmlns="urn:opendaylight:params:xml:ns:yang:controller:netconf:northbound:tcp"> \ <type xmlns:prefix="urn:opendaylight:params:xml:ns:yang:controller:config:netconf:northbound">prefix:netconf-server-dispatcher</type> \ <name>netconf-mdsal-server-dispatcher</name> \ </dispatcher> \ </module>'
In logs, you should see:
Netconf TCP endpoint started successfully at /0.0.0.0:2831
Testing over TCP[edit]
Make sure ODL is up and running according to the previous section
Go into the folder, where stress client and other test resources have been unpacked. The folder should look like this:
edit-l2fib-1000.txt edit-l2fib-1.txt netconf-north-perf-test-files.zip stress-client-1.0.0-Beryllium-package edit-l2fib-100.txt edit-l2fib-delete-all.txt netconf-testtool-1.0.0-Beryllium-stress-client.tar.gz
Execute the client:
java -Xmx2G -XX:MaxPermSize=256M -jar stress-client-1.0.0-Beryllium-package/stress-client-1.0.0-Beryllium-stress-client.jar --ip 127.0.0.1 --port 2831 --edits 10000 --exi false --ssh false --username admin --password admin --thread-amount 1 --async false --edit-batch-size 10000 --edit-content edit-l2fib-1.txt
This execution configuration is:
10000 edit-config RPCs (--edits 10000)
1 l2fib entry per edit-config message (see edit-l2fib-1.txt)
since there's one l2fib in file and 10k requests will be performed, total amount of l2fibs in ODL will be 10k
The input file contains a placeholder for {PHYS_ADDR}, that's replaced by client with real physical address. Each l2fib per each request gets a different physical address, causing all of the l2fibs to be different and really stored in ODL's global DataStore. There are more placeholders available from the client.
1 commit RPC per execution (--edit-batch-size)
over TCP (--ssh false)
from 1 thread (--thread-amount 1), synchronous (--async false) - async means whether the thread waits for reply after each edit or executes them asynchronously from dedicated thread and handles responses in a different one
No EXI (--exi false)
Towards ODL listening at 127.0.0.1 2831 (admin:admin) - that's the default config for MD-SAL NETCONF northbound
The client will be working for a moment and before it exits, it prints results to stdout:
FINISHED. Execution time: 5.585 s
Requests per second: 1790.8309455587394
The number: requests per second gives only the performance of edit-config execution. To calculate l2fibs/second, multiply that number by "number of l2fibs in edit-content file", which is 1 in our case so l2fibs/second is equal to requests/second = 1790.
L2fib count in MD-SAL can be verified with:
curl -u "admin:admin" http://localhost:8181/restconf/config/ncmount-l2fib:bridge-domains | grep -o forward | wc -l
Make sure to delete the DataStore for l2fibs by executing the client with input file for delete before executing further tests (Repeat from step "Execute the client"):
java -jar stress-client-1.0.0-Beryllium-package/stress-client-1.0.0-Beryllium-stress-client.jar --ip 127.0.0.1 --port 2831 --edits 1 --exi false --ssh false --username admin --password admin --thread-amount 1 --async false --edit-content edit-l2fib-delete-all.txt
It is advised to repeat the test before reading final performance results.
Note: The client has many configuration options. Use -h to see all of them. Note: There are 3 edit-content files in the resources. Each contains a different number of l2-fibs per edit-config: 1, 100 and 1000. Files with different amounts can be produced and used.
Testing over SSH
Testing over SSH is almost identical to TCP, just make sure to change the port and ssh flag when executing the client:
java -Xmx2G -XX:MaxPermSize=256M -jar stress-client-1.0.0-Beryllium-package/stress-client-1.0.0-Beryllium-stress-client.jar --ip 127.0.0.1 --port 2830 --edits 10000 --exi false --ssh true --username admin --password admin --thread-amount 1 --async false --edit-batch-size 10000 --edit-content edit-l2fib-1.txt
The results should be most-likely worse compared to TCP:
FINISHED. Execution time: 11.64 s
Requests per second: 859.106529209622
Delete can be performed same as with TCP, since it doesn't matter if delete is executed over TCP or SSH
Results
Beryllium
Environment:
OS: Ubuntu Linux 4.2.0-30-generic x86_64
CPU: Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz, 40 cores
RAM: 128GB
Network: Single VM for both ODL and simulated NETCONF device
JVM: Oracle 1.8.0_73
Base configuration:
Single client thread
Synchronous client thread
No SSH
No EXI
ODL HEAP: 8 Gb (edit in bin/setenv before executing karaf)
Measured numbers with a single client:
Netconf northbound single client performance | ||||||
|---|---|---|---|---|---|---|
Client type | l2fib per request | TCP performance | SSH performance | Total l2fibs | ||
Sync | 1 | 1 730 edits/s | 1 474 edits/s | 100k | ||
Async | 1 | 7 063 edits/s | 6 600 edits/s | 100k | ||
Sync | 100 | 233 edits/s | 148 edits/s | 500k | ||
Async | 100 | 421 edits/s | 386 edits/s | 500k | ||
Sync | 500 | 61 edits/s | 13 edits/s | 1M | ||
Async | 500 | 81 edits/s | 69 edits/s | 1M | ||
Sync | 1000 | 35 edits/s | 13 edits/s | 1M | ||
Async | 1000 | 38 edits/s | 19 edits/s | 1M | ||
Multiple clients:
Netconf northbound mutliplce client performance | |||||||
|---|---|---|---|---|---|---|---|
Clients | Client type | l2fib per request | TCP performance | SSH performance | Total l2fibs | ||
8 | Sync | 1 | 23 010 edits/s | 13 847 edits/s | 400k | ||
8 | Async | 1 | 41 114 edits/s | 12 527 edits/s | 400k | ||
16 | Sync | 1 | 31 743 edits/s | 15 879 edits/s | 400k | ||
16 | Async | 1 | 43 252 edits/s | 12 496 edits/s | 400k | ||
8 | Sync | 100 | 852 edits/s | 769 edits/s | 1,6M | ||
8 | Async | 100 | 984 edits/s | 869 edits/s | 1,6M | ||
16 | Sync | 100 | 808 edits/s | 723 edits/s | 1,6M | ||
16 | Async | 100 | 852 edits/s | 749 edits/s | 1,6M | ||
8 | Sync | 500 |
|
|
| ||
8 | Async | 500 |
|
|
| ||
16 | Sync | 500 |
|
|
| ||
16 | Async | 500 |
|
|
| ||
8 | Sync | 1000 |
|
|
| ||
8 | Async | 1000 |
|
|
| ||