Unable to remove 800 mountpoints from the datastore

Description

This issue is present only in the Calcium branch. The current Master branch does not seem to have this problem. When around 800 or more devices are connected to ODL, there is an issue with removing them all at once. They continue to appear as connected devices, but it is not possible to remove them or retrieve data from them.

Steps to reproduce:
1) Start ODL:
feature:install odl-netconf-topology odl-restconf-all
2) Start testtool:
java -jar netconf-testtool-7.0.11-SNAPSHOT-executable.jar --device-count 1 --ssh true --md-sal true
3) Connect 800 mountpoints.
4) Verify devices are connected.
5) Remove all devices.
6) Sent get request to network-topology and observe not removed devices:
http://127.0.0.1:8181/rests/data/network-topology:network-topology/topology=topology-netconf

To reproduce 3-5 steps, you can use testBug.sh script provided in attachments.

See ODL and testtool logs with provided errors in attachment.


Environment

None

Attachments

14
  • 28 Nov 2024, 06:04 PM
  • 17 Oct 2024, 03:39 PM
  • 17 Oct 2024, 03:39 PM
  • 17 Oct 2024, 03:39 PM
  • 17 Oct 2024, 10:38 AM
  • 17 Oct 2024, 10:38 AM
  • 17 Oct 2024, 10:06 AM
  • 17 Oct 2024, 10:06 AM
  • 17 Oct 2024, 10:06 AM
  • 17 Oct 2024, 10:06 AM
  • 17 Oct 2024, 10:06 AM
  • 16 Oct 2024, 01:38 PM

Activity

Show:

Peter Šuňa December 17, 2024 at 11:19 AM

Workaround which suppress this issue in NETCONF was merged:
https://lf-opendaylight.atlassian.net/browse/NETCONF-1432

as suggested, I tried changing this to STATE.setVolatile(this, STATE_SEALED). However, this did not resolve YT-1651. The suggested fix is invoked within transaction.commit(), but the problematic part starts sooner with transaction.delete(LogicalDatastoreType.OPERATIONAL, nodePath()).

Robert Varga December 9, 2024 at 7:46 PM

Yup, and that is weird, as nothing should be happening to ModifiedNode once we seal it, which is when we run resolveModificationType() and all that jazz.

The only thing that comes to mind is that we are dealing memory ordering effects, i.e. a rehash of https://lf-opendaylight.atlassian.net/browse/YANGTOOLS-1537 in which case we need to promote https://github.com/opendaylight/yangtools/blob/08f0b5e8eb71c9e37f7912736c624d19f288d552/data/yang-data-tree-ri/src/main/java/org/opendaylight/yangtools/yang/data/tree/impl/InMemoryDataTreeModification.java#L333 to a setVolatile()…

Sangwook Ha December 8, 2024 at 1:41 AM
Edited

requireDataAfter() is called only when node.modificationType() is WRITE like this:

case WRITE -> { out.writeByte(WRITE); out.writeNormalizedNode(requireDataAfter(node)); }

But the IOException says that it’s UNMODIFIED:

”java.io.IOException: Candidate for (urn:opendaylight:netconf-node-topology? revision=2023-11-21)unavailable-capabilities (UNMODIFIED) does not have after-image”

How is this possible? Does that mean modificationType changed between writeNode() & requireDataAfter() by other thread or something?

Peter Šuňa November 28, 2024 at 6:04 PM

This issue appears to persist in the current Calcium branch. When using the Netopeer device, I have encountered several devices that remain unremoved. In the LOG is present error:
”java.io.IOException: Candidate for (urn:opendaylight:netconf-node-topology?revision=2023-11-21)unavailable-capabilities (UNMODIFIED) does not have after-image”


Other branches are not affected after encapsulating netconf-node:
https://github.com/opendaylight/netconf/commit/1f8be4f6ee9c174e2bb796a3d4fd6242dc12d04b

Done

Details

Assignee

Reporter

Labels

Components

Fix versions

Affects versions

Priority

Created October 16, 2024 at 1:39 PM
Updated December 17, 2024 at 11:19 AM
Resolved October 17, 2024 at 9:34 PM