Define NormalizedBody and specializations
Description
Activity
Robert Varga November 4, 2022 at 11:07 PM
There is an interesting conflict between LeafNode and AnydataNode: how do we know an AnydataNode is not represented as a byte[], which is the same as a leaf for 'type binary'.
It seems this issue is a bit more thorny: we certainly want to solve ContainerNode (due to identifier) and LeafNode/LeafSetEntryNode (for memory efficiency), but others perhaps can be left alone for later.
If we scope out those two, they boil down to:
ValueNode, which covers LeafNode and LeafSetEntryNode. When dealing with reconstitution, LeafSetEntryNode can be known to be allowed only in LeafSetNodes and LeafNode is a DataContainerChild which implies its parent is a DataContainerNode
ContainerNode is a DataContainerNode (and DataContainerChild which adds guidance from ValueNode)
Looking at DataContainerNode, it is a DistinctNodeContainer (implying child addressability) and has three specializations:
ContainerNode
LeafSetNode (note: addressability is wild – either linear position for ordered, or identifier for unordered, this is still subject to evolution)
MapNode (addressability, but shares ordering w.r.t. UserMapNode/UserLeafSetNode)
DataContainerNode itself covers AugmentatioNode (which is going away) and ChoiceNode, ContainerNode, MapEntryNode, MountPointNode, UnkeyedListEntryNode – not all of which reap the memory benefits.
Robert Varga November 13, 2020 at 11:38 PM
We need the equivalent of ContainerBody for every NormalizedNode so that nodes can reconstructed easily. ForeignDataNodes are an exception: their body is a foreign representation, which we know nothing about (it can be a DOMSource).
Robert Varga November 13, 2020 at 10:24 PMEdited
The entry part of this proposal needs to be provided separately, as it would end up propagating type. The idea is that NormalizedNode is semantically equivalent to Entry<PathArgument, ? extends Object>, but we do not express that as a generic parameter by rather through an
We then override this method to provide more precise approcimation, so that:
Robert Varga February 21, 2020 at 4:34 PM
The idea is to have:
which would be a compat view. Except naming sucks – is much better to not lug around "State" suffix in most places. It also has performance implications on streaming API, where we really want to propagate these from a Map.entrySet(), otherwise we could end up hurting the GC.
I do not believe we can deliver this in Aluminium.
Analysis of a netvirt heap dump is showing that the data tree involved is around 640MiB, with 147MiB (~23%) retained by NormalizedNode implementations (6.5M objects). Each of these is costing typically 24 bytes, with 4 byte alignment shadow - typically containing the identifier and map of children.
If we eliminate the idea that a NormalizedNode has an identifier, we would save ~49MiB (33% shallow, 13% overall) heap by making these cost typically 16 bytes.
There are few places where identifier is required, which would have to be changed to carry Map.Entry<PathArgument, NormalizedNode> instead - a change cascading through quite a few interfaces. The most problematic will probably be NormalizedNodeStreamWriter.
Aside from memory savings, this would solve the interesting problem of DataTree root node (i.e. corresponding to SchemaContext) needing an identifier, which is currently wedged to use SchemaContext.NAME - which can lead to problems with revisions in some cases.
Consider dropping Identifiable from NormalizedNode, adjusting all users to cope with that.