Define NormalizedBody and specializations

Description

Analysis of a netvirt heap dump is showing that the data tree involved is around 640MiB, with 147MiB (~23%) retained by NormalizedNode implementations (6.5M objects). Each of these is costing typically 24 bytes, with 4 byte alignment shadow - typically containing the identifier and map of children.

If we eliminate the idea that a NormalizedNode has an identifier, we would save ~49MiB (33% shallow, 13% overall) heap by making these cost typically 16 bytes.

There are few places where identifier is required, which would have to be changed to carry Map.Entry<PathArgument, NormalizedNode> instead - a change cascading through quite a few interfaces. The most problematic will probably be NormalizedNodeStreamWriter.

Aside from memory savings, this would solve the interesting problem of DataTree root node (i.e. corresponding to SchemaContext) needing an identifier, which is currently wedged to use SchemaContext.NAME - which can lead to problems with revisions in some cases.

Consider dropping Identifiable from NormalizedNode, adjusting all users to cope with that.

Activity

Show:

Robert Varga November 4, 2022 at 11:07 PM

There is an interesting conflict between LeafNode and AnydataNode: how do we know an AnydataNode is not represented as a byte[], which is the same as a leaf for 'type binary'.
It seems this issue is a bit more thorny: we certainly want to solve ContainerNode (due to identifier) and LeafNode/LeafSetEntryNode (for memory efficiency), but others perhaps can be left alone for later.

If we scope out those two, they boil down to:

ValueNode, which covers LeafNode and LeafSetEntryNode. When dealing with reconstitution, LeafSetEntryNode can be known to be allowed only in LeafSetNodes and LeafNode is a DataContainerChild which implies its parent is a DataContainerNode
ContainerNode is a DataContainerNode (and DataContainerChild which adds guidance from ValueNode)

Looking at DataContainerNode, it is a DistinctNodeContainer (implying child addressability) and has three specializations:

ContainerNode
LeafSetNode (note: addressability is wild – either linear position for ordered, or identifier for unordered, this is still subject to evolution)
MapNode (addressability, but shares ordering w.r.t. UserMapNode/UserLeafSetNode)

DataContainerNode itself covers AugmentatioNode (which is going away) and ChoiceNode, ContainerNode, MapEntryNode, MountPointNode, UnkeyedListEntryNode – not all of which reap the memory benefits.

Robert Varga November 13, 2020 at 11:38 PM

We need the equivalent of ContainerBody for every NormalizedNode so that nodes can reconstructed easily. ForeignDataNodes are an exception: their body is a foreign representation, which we know nothing about (it can be a DOMSource).

Robert Varga November 13, 2020 at 10:24 PM
Edited

The entry part of this proposal needs to be provided separately, as it would end up propagating type. The idea is that NormalizedNode is semantically equivalent to Entry<PathArgument, ? extends Object>, but we do not express that as a generic parameter by rather through an

We then override this method to provide more precise approcimation, so that:

Robert Varga February 21, 2020 at 4:34 PM

The idea is to have:

which would be a compat view. Except naming sucks – is much better to not lug around "State" suffix in most places. It also has performance implications on streaming API, where we really want to propagate these from a Map.entrySet(), otherwise we could end up hurting the GC.

I do not believe we can deliver this in Aluminium.

Details

Assignee

Unassigned

Reporter

Robert Varga

Components

Fix versions

15.0.0

Priority

High

Parent

YANGTOOLS-1022 Rework NormalizedNode class hierarchy and design

Created January 22, 2020 at 9:25 AM

Updated February 6, 2025 at 2:13 PM

Configure