[API Proposal]: Mergeable dictionary (mutable and persistent) #80339

zvrba · 2023-01-08T10:14:44Z

Background and motivation

For a long time I've been annoyed at the fact that a dictionary is an ISet<KeyValuePair<K, V>> yet we cannot use set operations over them, except... by manual foreach. At the same time, I often wanted persistent (immutable) dictionaries. So I've created a prototype implementation that supports both from the same code-base; description and benchmarks are available here: https://github.com/zvrba/Pfm

The code that implements the APIs that I propose to merge is here: https://github.com/zvrba/Pfm/tree/master/Pfm.Collections/JoinTree

Features:

Provides set operations (equality, union, difference, intersection) with customizable merging of equivalent values
Provides iterator for backward/forward navigation
Provides element lookup by index (n'th element in sorted order, 0 being the smallest)
Traits-based ("static polymorphism" pattern widely used in C++) which enables devirtualization and inlining [1]. Tree mechanics (AVL or weight-balanced) and mutability/persistence are configurable at compile-time. Traits also define how equivalent values are merged for intersection and difference. See example below.

The implementation is based on the paper: Guy Blelloch, Daniel Ferizovic, and Yihan Sun. 2022. Joinable Parallel Balanced Binary Trees. ACM Trans.
Parallel Comput. 9, 2, Article 7 (April 2022), 41 pages.
https://doi.org/10.1145/3512769

All in all, a set of building blocks for various kinds of sorted sets and dictionaries.
I was able to find the following related proposals (there are perhaps more) that are directly solvable by JoinTree:

[API Proposal]: Adding ToSortedSet and ToSortedDictionary extension methods to the Enumerable class #79780 : which concrete collection to create?
[API Proposal]: Add GetViewBetween method to SortedDictionary #77645 : easily added to JoinTree and wrapper classes. Also addressed by Iterator<>
Add unbounded SortedSet<T>.TreeSubSet API #55510 : easily added to JoinTree
[API Proposal]: Add overloaded methods for GetOrAdd, adding out parameters indicating if the key was added to the dictionary #79934 : already supported by JoinTree
[Proposal] Use B+-trees instead of AVL trees for Immutable Collections #14477 : JoinTree improves performance in the immutable case, without concerns of copying larger nodes of B+-trees

[1] See benchmarks at the linked page. SortedSet (RB-tree) and ImmutableSortedSet (AVL tree) have surprisingly slow lookup performance, which I suspect is due to using delegates or comparers for comparisons. Joinable tree is slightly slower than SortedSet for insertions and deletions, but the implementation can probably improved by replacing recursions with iteration. The immutable versions are just faster than ImmutableSortedSet.

API Proposal

The main user-facing class would be MergeableDictionary, which is a wrapper around AVL joinable tree.
Otherwise, everything in https://github.com/zvrba/Pfm/tree/master/Pfm.Collections/JoinTree should be in the API.

To benefit from performance improvements, a similar wrapper class should be provided for sets, but I'm lacking a good name.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics.CodeAnalysis;

namespace Pfm.Collections.JoinTree;

/// <summary>
/// Common implementation of mutable or immutable dictionaries.  Mutability (<see cref="IsPersistent"/>) is determined
/// by how <typeparamref name="TNodeTraits"/> implement
/// <see cref="INodeTraits{TValue}.Clone(Node{TValue})"/>.  In any case, the implementation gives an illusion of
/// a mutable data structure.  To preserve previous versions, use <see cref="Copy(bool)"/> method which default
/// implementation is extremely efficient for immutable dictionaries (single allocation).
/// </summary>
/// <typeparam name="K">Key type.</typeparam>
/// <typeparam name="V">Value type.</typeparam>
/// <typeparam name="TNodeTraits">
/// Node traits determine how key/value pairs are compared, merged, cloned and whether the tree is persistent.
/// </typeparam>
public class MergeableDictionary<K, V, TNodeTraits> :
    ICloneable,
    ISet<KeyValuePair<K, V>>,
    IReadOnlyList<KeyValuePair<K, V>>,
    IDictionary<K, V>
    where TNodeTraits : struct, INodeTraits<KeyValuePair<K, V>>
{
    private JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>> tree;

    public MergeableDictionary() => tree = default;
    private MergeableDictionary(JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>> other)
        => tree = other;

    /// <summary>
    /// If true, the collection is thread-safe without external locking.  This instance will nevertheless
    /// give an illusion of a mutable data structure by replacing the current version with the new.  To
    /// preserve previous versions, use <see cref="Copy(bool)"/>.
    /// </summary>
    public bool IsPersistent => TNodeTraits.IsPersistent;

    /// <summary>
    /// Number of elements in this dictionary.
    /// </summary>
    public int Count => tree.Count;

    /// <summary>
    /// Always mutable.
    /// </summary>
    public bool IsReadOnly => false;

    /// <summary>
    /// Creates a deep copy of elements stored in <c>this</c>.
    /// </summary>
    /// <param name="force">
    /// If true, and the dictionary is persistent, it will force creation of a deep copy instead of 
    /// reusing the same storage.  This may be desirable if <typeparamref name="TNodeTraits"/> also
    /// clones the stored value.
    /// </param>
    /// <returns>
    /// A new instance containing the same elements as <c>this</c>.
    /// </returns>
    public MergeableDictionary<K, V, TNodeTraits> Copy(bool force = false) =>
        IsPersistent && !force ? new(tree) : new(tree.Copy(force));

    /// <summary>
    /// Always makes a deep copy of the dictionary, regardless of persistence.
    /// </summary>
    /// <returns>
    /// A new, deeply-copied instance.  Whether the values are cloned is determined by the implementation
    /// of <typeparamref name="TNodeTraits"/>.
    /// </returns>
    public object Clone() => Copy(true);

    /// <summary>
    /// Gets an iterator for two-way traversal of the dictionary in key order.
    /// </summary>
    /// <returns>
    /// A new iterator instance.
    /// </returns>
    public Iterator<KeyValuePair<K, V>> GetIterator() => tree.GetIterator();

    /// <summary>
    /// Returns the n'th element in the sorted order.
    /// </summary>
    /// <param name="index">
    /// Element index to fetch; smallest is at index 0.
    /// </param>
    /// <returns>
    /// The found key-value pair.
    /// </returns>
    /// <exception cref="IndexOutOfRangeException">When index is out of range <c>[0, Count)</c>.</exception>
    public KeyValuePair<K, V> Nth(int index) => tree.Nth(index);

    // Additional set operations: the algorithms are more efficient than those taking IEnumerable.
    // NB! They are destructive to "other" argument if the tree is not persistent.

    public void UnionWith(MergeableDictionary<K, V, TNodeTraits> other) =>
        tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetUnion(tree, other.tree);

    public void IntersectWith(MergeableDictionary<K, V, TNodeTraits> other) =>
        tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetIntersection(tree, other.tree);

    public void ExceptWith(MergeableDictionary<K, V, TNodeTraits> other) =>
        tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetDifference(tree, other.tree);



    /// <summary>
    /// Explicit implementation, would clash with ordinary indexer for <c>int</c> keys.
    /// </summary>
    /// <seealso cref="Nth(int)"/>
    KeyValuePair<K, V> IReadOnlyList<KeyValuePair<K, V>>.this[int index] => tree.Nth(index);

    bool ISet<KeyValuePair<K, V>>.Add(KeyValuePair<K, V> item) => throw new NotImplementedException();
    void ISet<KeyValuePair<K, V>>.ExceptWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    void ISet<KeyValuePair<K, V>>.IntersectWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.IsProperSubsetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.IsProperSupersetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.IsSubsetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.IsSupersetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.Overlaps(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.SetEquals(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    void ISet<KeyValuePair<K, V>>.SymmetricExceptWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    void ISet<KeyValuePair<K, V>>.UnionWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    
    void ICollection<KeyValuePair<K, V>>.Add(KeyValuePair<K, V> item) => throw new NotImplementedException();
    void ICollection<KeyValuePair<K, V>>.Clear() => throw new NotImplementedException();
    bool ICollection<KeyValuePair<K, V>>.Contains(KeyValuePair<K, V> item) => throw new NotImplementedException();
    void ICollection<KeyValuePair<K, V>>.CopyTo(KeyValuePair<K, V>[] array, int arrayIndex) => throw new NotImplementedException();
    bool ICollection<KeyValuePair<K, V>>.Remove(KeyValuePair<K, V> item) => throw new NotImplementedException();
    
    IEnumerator<KeyValuePair<K, V>> IEnumerable<KeyValuePair<K, V>>.GetEnumerator() => throw new NotImplementedException();
    IEnumerator IEnumerable.GetEnumerator() => throw new NotImplementedException();

    public ICollection<K> Keys => throw new NotImplementedException();
    public ICollection<V> Values => throw new NotImplementedException();
    public V this[K key] { get => throw new NotImplementedException(); set => throw new NotImplementedException(); }

    public void Add(K key, V value) => throw new NotImplementedException();
    public bool ContainsKey(K key) => throw new NotImplementedException();
    public bool Remove(K key) => throw new NotImplementedException();
    public bool TryGetValue(K key, [MaybeNullWhen(false)] out V value) => throw new NotImplementedException();
}

API Usage

    struct MutableTraits : INodeTraits<KeyValuePair<int, string>>
    {
        public static int Compare(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
            => left.Key - right.Key;
        public static KeyValuePair<int, string> Merge(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
            => new(left.Key, left.Value);
        public static bool IsPersistent => false;
        public static Node<KeyValuePair<int, string>> Clone(Node<KeyValuePair<int, string>> x) => x;
    }

    struct ImmutableTraits : INodeTraits<KeyValuePair<int, string>>
    {
        public static int Compare(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
            => left.Key - right.Key;
        public static KeyValuePair<int, string> Merge(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
            => new(left.Key, left.Value);
        public static bool IsPersistent => true;

        // The implementation can also choose to clone the value contained in the node.
        public static Node<KeyValuePair<int, string>> Clone(Node<KeyValuePair<int, string>> x) => new(x);
    }

    MergeableDictionary<int, string, MutableTraits> d1;
    MergeableDictionary<int, string, ImmutableTraits> d2;

Alternative Designs

I suspected that the abstract statics in interfaces can be used to enable devirtualization and inlining to enable more efficient code generation, as shown by "Find" benchmarks. "Less-verbose" designs, i.e., using comparers or delegates as in traditional System.Collections.{Generic,Immutable} would probably degrade performance.

Iterator<TValue> should also support direct Find(TValue value) and Seek(int index), but would need to also be parametrized by additional TNodeTraits : struct, INodeTraits<> parameter, which would make declarations more verbose. The features are easy to add.

Main design concern is whether to enable arbitrary augmentations of tree nodes with monoidal tags. (E.g., it would allow users to use JoinTree as a toolkit to build, say, an interval tree; see Cormen et.al. cited below.) Size and rank are two examples of such tags and are currently maintained by the tree code. Alternative design for node would be Node<TValue, TTag> where TTag : struct, ITagTraits<TTag>. Such design would place more burden on the end-user implementing since TTag would have to include and maintain both metadata needed by the tree and the user's custom augmentation. [2] Further applications of such design can be found in section 5 of https://arxiv.org/abs/1612.05665

Compare and Merge methods could be isolated in own IValueTraits<TValue> interface (facilitating reuse), but this would require additional generic parameter at points of use of the tree / dictionary.

(I believe "existential types (language proposal)" would alleviate concerns about "cumbersome use" if the two above paragraphs were implemented.)

Additional node-based methods such as Insert(ref Node<TValue>) and Delete(Node<TValue>) could be considered for allocation-free moving of nodes between different containers, esp. when combined with use of iterators.

Node<> could be extended with a "transient tag", thus increasing its size, but allowing the tree to eschew cloning of the nodes when the node's "transient tag" matches the tree's "transient tag". (This is how persistent collections in Clojure work.)

A general issue that I would like to discuss: binary search trees are a well-studied abstraction and perhaps the standard library should just provide a customizable "tree toolkit" that users can build their own collections upon, instead of "hiding" trees behind pre-baked sorted collections. The proposals cited above witness that the need is there and a standard "collection" will never be able to support all use-cases. On the other hand, when an unsupported use-case does occur, it is sad that users has to roll their own trees from scratch (time-consuming and error-prone), or use a suboptimal solution.

[2] Efficient augmentation of balanced tree is possible when conditions from the following theorem (Cormen et al. "Introduction to algorithms", 2nd ed.) hold: Let F be a field that augments a [red-black] tree T of n nodes, and suppose that the contents of F for a node x can be computed using only the information in nodes x, x.L, and x.R, including x.L.F and x.R.F. Then, we can maintain the values of F in all nodes of T during insertion and deletion without asymptotically affecting the O(lg n) performance of these operations.

Risks

Objectively, no as no existing classes or interfaces need modification. Otherwise, having multiple sorted set/dictionary implementations could be confusing. The proposed mergeable dictionary class could also function as a fully-fledged sorted concurrent dictionary.

The text was updated successfully, but these errors were encountered:

ghost · 2023-01-08T10:14:52Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

For a long time I've been annoyed at the fact that a dictionary is an ISet<KeyValuePair<K, V>> yet we cannot use set operations over them, except... by manual foreach. At the same time, I often wanted persistent (immutable) dictionaries. So I've created a prototype implementation that supports both from the same code-base; description and benchmarks are available here: https://github.com/zvrba/Pfm

The code that implements the APIs that I propose to merge is here: https://github.com/zvrba/Pfm/tree/master/Pfm.Collections/JoinTree

Features:

Provides set operations (equality, union, difference, intersection)
Provides iterator for backward/forward navigation
Provides element lookup by index (n'th element in sorted order, 0 being the smallest)
Traits-based ("static polymorphism" pattern widely used in C++) which enables devirtualization and inlining [1]. Tree mechanics (AVL or weight-balanced) and mutability/persistence are configurable at compile-time. Traits also define how equivalent values are merged for intersection and difference. See example below.

The implementation is based on the paper: Guy Blelloch, Daniel Ferizovic, and Yihan Sun. 2022. Joinable Parallel Balanced Binary Trees. ACM Trans.
Parallel Comput. 9, 2, Article 7 (April 2022), 41 pages.
https://doi.org/10.1145/3512769

All in all, a set of building blocks for various kinds of sorted sets and dictionaries.
I was able to find the following related proposals (there are perhaps more) that are directly solvable by JoinTree:

[API Proposal]: Adding ToSortedSet and ToSortedDictionary extension methods to the Enumerable class #79780 : which concrete collection to create?
[API Proposal]: Add GetViewBetween method to SortedDictionary #77645 : easily added to JoinTree and wrapper classes. Also addressed by Iterator<>
Add unbounded SortedSet<T>.TreeSubSet API #55510 : easily added to JoinTree
[API Proposal]: Add overloaded methods for GetOrAdd, adding out parameters indicating if the key was added to the dictionary #79934 : already supported by JoinTree
[Proposal] Use B+-trees instead of AVL trees for Immutable Collections #14477 : JoinTree improves performance in the immutable case, without concerns of copying larger nodes of B+-trees

[1] See benchmarks at the linked page. SortedSet (RB-tree) and ImmutableSortedSet (AVL tree) have surprisingly slow lookup performance, which I suspect is due to using delegates or comparers for comparisons. Joinable tree is slightly slower than SortedSet for insertions and deletions, but the implementation can probably improved by replacing recursions with iteration. The immutable versions are just faster than ImmutableSortedSet.

API Proposal

The main user-facing class would be MergeableDictionary, which is a wrapper around AVL joinable tree.
Otherwise, everything in https://github.com/zvrba/Pfm/tree/master/Pfm.Collections/JoinTree should be in the API.

To benefit from performance improvements, a similar wrapper class should be provided for sets, but I'm lacking a good name.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics.CodeAnalysis;

namespace Pfm.Collections.JoinTree;

/// <summary>
/// Common implementation of mutable or immutable dictionaries.  Mutability (<see cref="IsPersistent"/>) is determined
/// by how <typeparamref name="TNodeTraits"/> implement
/// <see cref="INodeTraits{TValue}.Clone(Node{TValue})"/>.  In any case, the implementation gives an illusion of
/// a mutable data structure.  To preserve previous versions, use <see cref="Copy(bool)"/> method which default
/// implementation is extremely efficient for immutable dictionaries (single allocation).
/// </summary>
/// <typeparam name="K">Key type.</typeparam>
/// <typeparam name="V">Value type.</typeparam>
/// <typeparam name="TNodeTraits">
/// Node traits determine how key/value pairs are compared, merged, cloned and whether the tree is persistent.
/// </typeparam>
public class MergeableDictionary<K, V, TNodeTraits> :
    ICloneable,
    ISet<KeyValuePair<K, V>>,
    IReadOnlyList<KeyValuePair<K, V>>,
    IDictionary<K, V>
    where TNodeTraits : struct, INodeTraits<KeyValuePair<K, V>>
{
    private JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>> tree;

    public MergeableDictionary() => tree = default;
    private MergeableDictionary(JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>> other)
        => tree = other;

    /// <summary>
    /// If true, the collection is thread-safe without external locking.  This instance will nevertheless
    /// give an illusion of a mutable data structure by replacing the current version with the new.  To
    /// preserve previous versions, use <see cref="Copy(bool)"/>.
    /// </summary>
    public bool IsPersistent => TNodeTraits.IsPersistent;

    /// <summary>
    /// Number of elements in this dictionary.
    /// </summary>
    public int Count => tree.Count;

    /// <summary>
    /// Always mutable.
    /// </summary>
    public bool IsReadOnly => false;

    /// <summary>
    /// Creates a deep copy of elements stored in <c>this</c>.
    /// </summary>
    /// <param name="force">
    /// If true, and the dictionary is persistent, it will force creation of a deep copy instead of 
    /// reusing the same storage.  This may be desirable if <typeparamref name="TNodeTraits"/> also
    /// clones the stored value.
    /// </param>
    /// <returns>
    /// A new instance containing the same elements as <c>this</c>.
    /// </returns>
    public MergeableDictionary<K, V, TNodeTraits> Copy(bool force = false) =>
        IsPersistent && !force ? new(tree) : new(tree.Copy(force));

    /// <summary>
    /// Always makes a deep copy of the dictionary, regardless of persistence.
    /// </summary>
    /// <returns>
    /// A new, deeply-copied instance.  Whether the values are cloned is determined by the implementation
    /// of <typeparamref name="TNodeTraits"/>.
    /// </returns>
    public object Clone() => Copy(true);

    /// <summary>
    /// Gets an iterator for two-way traversal of the dictionary in key order.
    /// </summary>
    /// <returns>
    /// A new iterator instance.
    /// </returns>
    public Iterator<KeyValuePair<K, V>> GetIterator() => tree.GetIterator();

    /// <summary>
    /// Returns the n'th element in the sorted order.
    /// </summary>
    /// <param name="index">
    /// Element index to fetch; smallest is at index 0.
    /// </param>
    /// <returns>
    /// The found key-value pair.
    /// </returns>
    /// <exception cref="IndexOutOfRangeException">When index is out of range <c>[0, Count)</c>.</exception>
    public KeyValuePair<K, V> Nth(int index) => tree.Nth(index);

    // Additional set operations: the algorithms are more efficient than those taking IEnumerable.
    // NB! They are destructive to "other" argument if the tree is not persistent.

    public void UnionWith(MergeableDictionary<K, V, TNodeTraits> other) =>
        tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetUnion(tree, other.tree);

    public void IntersectWith(MergeableDictionary<K, V, TNodeTraits> other) =>
        tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetIntersection(tree, other.tree);

    public void ExceptWith(MergeableDictionary<K, V, TNodeTraits> other) =>
        tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetDifference(tree, other.tree);



    /// <summary>
    /// Explicit implementation, would clash with ordinary indexer for <c>int</c> keys.
    /// </summary>
    /// <seealso cref="Nth(int)"/>
    KeyValuePair<K, V> IReadOnlyList<KeyValuePair<K, V>>.this[int index] => tree.Nth(index);

    bool ISet<KeyValuePair<K, V>>.Add(KeyValuePair<K, V> item) => throw new NotImplementedException();
    void ISet<KeyValuePair<K, V>>.ExceptWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    void ISet<KeyValuePair<K, V>>.IntersectWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.IsProperSubsetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.IsProperSupersetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.IsSubsetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.IsSupersetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.Overlaps(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    bool ISet<KeyValuePair<K, V>>.SetEquals(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    void ISet<KeyValuePair<K, V>>.SymmetricExceptWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    void ISet<KeyValuePair<K, V>>.UnionWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
    
    void ICollection<KeyValuePair<K, V>>.Add(KeyValuePair<K, V> item) => throw new NotImplementedException();
    void ICollection<KeyValuePair<K, V>>.Clear() => throw new NotImplementedException();
    bool ICollection<KeyValuePair<K, V>>.Contains(KeyValuePair<K, V> item) => throw new NotImplementedException();
    void ICollection<KeyValuePair<K, V>>.CopyTo(KeyValuePair<K, V>[] array, int arrayIndex) => throw new NotImplementedException();
    bool ICollection<KeyValuePair<K, V>>.Remove(KeyValuePair<K, V> item) => throw new NotImplementedException();
    
    IEnumerator<KeyValuePair<K, V>> IEnumerable<KeyValuePair<K, V>>.GetEnumerator() => throw new NotImplementedException();
    IEnumerator IEnumerable.GetEnumerator() => throw new NotImplementedException();

    public ICollection<K> Keys => throw new NotImplementedException();
    public ICollection<V> Values => throw new NotImplementedException();
    public V this[K key] { get => throw new NotImplementedException(); set => throw new NotImplementedException(); }

    public void Add(K key, V value) => throw new NotImplementedException();
    public bool ContainsKey(K key) => throw new NotImplementedException();
    public bool Remove(K key) => throw new NotImplementedException();
    public bool TryGetValue(K key, [MaybeNullWhen(false)] out V value) => throw new NotImplementedException();
}

API Usage

    struct MutableTraits : INodeTraits<KeyValuePair<int, string>>
    {
        public static int Compare(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
            => left.Key - right.Key;
        public static KeyValuePair<int, string> Merge(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
            => new(left.Key, left.Value);
        public static bool IsPersistent => false;
        public static Node<KeyValuePair<int, string>> Clone(Node<KeyValuePair<int, string>> x) => x;
    }

    struct ImmutableTraits : INodeTraits<KeyValuePair<int, string>>
    {
        public static int Compare(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
            => left.Key - right.Key;
        public static KeyValuePair<int, string> Merge(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
            => new(left.Key, left.Value);
        public static bool IsPersistent => true;
        public static Node<KeyValuePair<int, string>> Clone(Node<KeyValuePair<int, string>> x) => new(x);
    }

    MergeableDictionary<int, string, MutableTraits> d1;
    MergeableDictionary<int, string, ImmutableTraits> d2;

Alternative Designs

I suspected that the abstract statics in interfaces can be used to enable devirtualization and inlining to enable more efficient code generation, as shown by "Find" benchmarks. "Less-verbose" designs, i.e., using comparers or delegates as in traditional System.Collections.{Generic,Immutable} would probably degrade performance.

Iterator<TValue> should also support direct Find(TValue value) and Seek(int index), but would need to also be parametrized by additional TNodeTraits : struct, INodeTraits<> parameter, which would make declarations more verbose. The features are easy to add.

Main design concern is whether to enable arbitrary augmentations of tree nodes with monoidal tags. (E.g., it would allow users to use JoinTree as a toolkit to build, say, an interval tree.) Size and rank are two examples of such tags and are currently maintained by the tree code. Alternative design for node would be Node<TValue, TTag> where TTag : struct, ITagTraits<TTag>. Such design would place more burden on the end-user implementing since TTag would have to include and maintain both metadata needed by the tree and the user's custom augmentation.

Additional node-based methods such as Insert(ref Node<TValue>) and Delete(Node<TValue>) could be considered for allocation-free moving of nodes between different containers, esp. when combined with use of iterators.

Risks

Objectively, no as no existing classes or interfaces need modification. Otherwise, having multiple sorted set/dictionary implementations could be confusing. The proposed mergeable dictionary class could also function as a fully-fledged sorted concurrent dictionary.

Author:	zvrba
Assignees:	-
Labels:	`api-suggestion`, `area-System.Collections`
Milestone:	-

zvrba added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 8, 2023

dotnet-issue-labeler bot added the area-System.Collections label Jan 8, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Jan 8, 2023

layomia removed the untriaged New issue has not been triaged by the area owner label Jan 9, 2023

layomia added this to the Future milestone Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: Mergeable dictionary (mutable and persistent) #80339

[API Proposal]: Mergeable dictionary (mutable and persistent) #80339

zvrba commented Jan 8, 2023 •

edited

Loading

ghost commented Jan 8, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

[API Proposal]: Mergeable dictionary (mutable and persistent) #80339

[API Proposal]: Mergeable dictionary (mutable and persistent) #80339

Comments

zvrba commented Jan 8, 2023 • edited Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

ghost commented Jan 8, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

zvrba commented Jan 8, 2023 •

edited

Loading