-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Mergeable dictionary (mutable and persistent) #80339
Comments
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsBackground and motivationFor a long time I've been annoyed at the fact that a dictionary is an The code that implements the APIs that I propose to merge is here: https://github.com/zvrba/Pfm/tree/master/Pfm.Collections/JoinTree Features:
The implementation is based on the paper: Guy Blelloch, Daniel Ferizovic, and Yihan Sun. 2022. Joinable Parallel Balanced Binary Trees. ACM Trans. All in all, a set of building blocks for various kinds of sorted sets and dictionaries.
[1] See benchmarks at the linked page. API ProposalThe main user-facing class would be To benefit from performance improvements, a similar wrapper class should be provided for sets, but I'm lacking a good name. using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics.CodeAnalysis;
namespace Pfm.Collections.JoinTree;
/// <summary>
/// Common implementation of mutable or immutable dictionaries. Mutability (<see cref="IsPersistent"/>) is determined
/// by how <typeparamref name="TNodeTraits"/> implement
/// <see cref="INodeTraits{TValue}.Clone(Node{TValue})"/>. In any case, the implementation gives an illusion of
/// a mutable data structure. To preserve previous versions, use <see cref="Copy(bool)"/> method which default
/// implementation is extremely efficient for immutable dictionaries (single allocation).
/// </summary>
/// <typeparam name="K">Key type.</typeparam>
/// <typeparam name="V">Value type.</typeparam>
/// <typeparam name="TNodeTraits">
/// Node traits determine how key/value pairs are compared, merged, cloned and whether the tree is persistent.
/// </typeparam>
public class MergeableDictionary<K, V, TNodeTraits> :
ICloneable,
ISet<KeyValuePair<K, V>>,
IReadOnlyList<KeyValuePair<K, V>>,
IDictionary<K, V>
where TNodeTraits : struct, INodeTraits<KeyValuePair<K, V>>
{
private JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>> tree;
public MergeableDictionary() => tree = default;
private MergeableDictionary(JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>> other)
=> tree = other;
/// <summary>
/// If true, the collection is thread-safe without external locking. This instance will nevertheless
/// give an illusion of a mutable data structure by replacing the current version with the new. To
/// preserve previous versions, use <see cref="Copy(bool)"/>.
/// </summary>
public bool IsPersistent => TNodeTraits.IsPersistent;
/// <summary>
/// Number of elements in this dictionary.
/// </summary>
public int Count => tree.Count;
/// <summary>
/// Always mutable.
/// </summary>
public bool IsReadOnly => false;
/// <summary>
/// Creates a deep copy of elements stored in <c>this</c>.
/// </summary>
/// <param name="force">
/// If true, and the dictionary is persistent, it will force creation of a deep copy instead of
/// reusing the same storage. This may be desirable if <typeparamref name="TNodeTraits"/> also
/// clones the stored value.
/// </param>
/// <returns>
/// A new instance containing the same elements as <c>this</c>.
/// </returns>
public MergeableDictionary<K, V, TNodeTraits> Copy(bool force = false) =>
IsPersistent && !force ? new(tree) : new(tree.Copy(force));
/// <summary>
/// Always makes a deep copy of the dictionary, regardless of persistence.
/// </summary>
/// <returns>
/// A new, deeply-copied instance. Whether the values are cloned is determined by the implementation
/// of <typeparamref name="TNodeTraits"/>.
/// </returns>
public object Clone() => Copy(true);
/// <summary>
/// Gets an iterator for two-way traversal of the dictionary in key order.
/// </summary>
/// <returns>
/// A new iterator instance.
/// </returns>
public Iterator<KeyValuePair<K, V>> GetIterator() => tree.GetIterator();
/// <summary>
/// Returns the n'th element in the sorted order.
/// </summary>
/// <param name="index">
/// Element index to fetch; smallest is at index 0.
/// </param>
/// <returns>
/// The found key-value pair.
/// </returns>
/// <exception cref="IndexOutOfRangeException">When index is out of range <c>[0, Count)</c>.</exception>
public KeyValuePair<K, V> Nth(int index) => tree.Nth(index);
// Additional set operations: the algorithms are more efficient than those taking IEnumerable.
// NB! They are destructive to "other" argument if the tree is not persistent.
public void UnionWith(MergeableDictionary<K, V, TNodeTraits> other) =>
tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetUnion(tree, other.tree);
public void IntersectWith(MergeableDictionary<K, V, TNodeTraits> other) =>
tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetIntersection(tree, other.tree);
public void ExceptWith(MergeableDictionary<K, V, TNodeTraits> other) =>
tree = JoinTree<KeyValuePair<K, V>, TNodeTraits, AvlTree<KeyValuePair<K, V>, TNodeTraits>>.SetDifference(tree, other.tree);
/// <summary>
/// Explicit implementation, would clash with ordinary indexer for <c>int</c> keys.
/// </summary>
/// <seealso cref="Nth(int)"/>
KeyValuePair<K, V> IReadOnlyList<KeyValuePair<K, V>>.this[int index] => tree.Nth(index);
bool ISet<KeyValuePair<K, V>>.Add(KeyValuePair<K, V> item) => throw new NotImplementedException();
void ISet<KeyValuePair<K, V>>.ExceptWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
void ISet<KeyValuePair<K, V>>.IntersectWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
bool ISet<KeyValuePair<K, V>>.IsProperSubsetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
bool ISet<KeyValuePair<K, V>>.IsProperSupersetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
bool ISet<KeyValuePair<K, V>>.IsSubsetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
bool ISet<KeyValuePair<K, V>>.IsSupersetOf(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
bool ISet<KeyValuePair<K, V>>.Overlaps(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
bool ISet<KeyValuePair<K, V>>.SetEquals(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
void ISet<KeyValuePair<K, V>>.SymmetricExceptWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
void ISet<KeyValuePair<K, V>>.UnionWith(IEnumerable<KeyValuePair<K, V>> other) => throw new NotImplementedException();
void ICollection<KeyValuePair<K, V>>.Add(KeyValuePair<K, V> item) => throw new NotImplementedException();
void ICollection<KeyValuePair<K, V>>.Clear() => throw new NotImplementedException();
bool ICollection<KeyValuePair<K, V>>.Contains(KeyValuePair<K, V> item) => throw new NotImplementedException();
void ICollection<KeyValuePair<K, V>>.CopyTo(KeyValuePair<K, V>[] array, int arrayIndex) => throw new NotImplementedException();
bool ICollection<KeyValuePair<K, V>>.Remove(KeyValuePair<K, V> item) => throw new NotImplementedException();
IEnumerator<KeyValuePair<K, V>> IEnumerable<KeyValuePair<K, V>>.GetEnumerator() => throw new NotImplementedException();
IEnumerator IEnumerable.GetEnumerator() => throw new NotImplementedException();
public ICollection<K> Keys => throw new NotImplementedException();
public ICollection<V> Values => throw new NotImplementedException();
public V this[K key] { get => throw new NotImplementedException(); set => throw new NotImplementedException(); }
public void Add(K key, V value) => throw new NotImplementedException();
public bool ContainsKey(K key) => throw new NotImplementedException();
public bool Remove(K key) => throw new NotImplementedException();
public bool TryGetValue(K key, [MaybeNullWhen(false)] out V value) => throw new NotImplementedException();
} API Usage struct MutableTraits : INodeTraits<KeyValuePair<int, string>>
{
public static int Compare(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
=> left.Key - right.Key;
public static KeyValuePair<int, string> Merge(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
=> new(left.Key, left.Value);
public static bool IsPersistent => false;
public static Node<KeyValuePair<int, string>> Clone(Node<KeyValuePair<int, string>> x) => x;
}
struct ImmutableTraits : INodeTraits<KeyValuePair<int, string>>
{
public static int Compare(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
=> left.Key - right.Key;
public static KeyValuePair<int, string> Merge(KeyValuePair<int, string> left, KeyValuePair<int, string> right)
=> new(left.Key, left.Value);
public static bool IsPersistent => true;
public static Node<KeyValuePair<int, string>> Clone(Node<KeyValuePair<int, string>> x) => new(x);
}
MergeableDictionary<int, string, MutableTraits> d1;
MergeableDictionary<int, string, ImmutableTraits> d2; Alternative DesignsI suspected that the abstract statics in interfaces can be used to enable devirtualization and inlining to enable more efficient code generation, as shown by "Find" benchmarks. "Less-verbose" designs, i.e., using comparers or delegates as in traditional
Main design concern is whether to enable arbitrary augmentations of tree nodes with monoidal tags. (E.g., it would allow users to use Additional node-based methods such as RisksObjectively, no as no existing classes or interfaces need modification. Otherwise, having multiple sorted set/dictionary implementations could be confusing. The proposed mergeable dictionary class could also function as a fully-fledged sorted concurrent dictionary.
|
Background and motivation
For a long time I've been annoyed at the fact that a dictionary is an
ISet<KeyValuePair<K, V>>
yet we cannot use set operations over them, except... by manualforeach
. At the same time, I often wanted persistent (immutable) dictionaries. So I've created a prototype implementation that supports both from the same code-base; description and benchmarks are available here: https://github.com/zvrba/PfmThe code that implements the APIs that I propose to merge is here: https://github.com/zvrba/Pfm/tree/master/Pfm.Collections/JoinTree
Features:
The implementation is based on the paper: Guy Blelloch, Daniel Ferizovic, and Yihan Sun. 2022. Joinable Parallel Balanced Binary Trees. ACM Trans.
Parallel Comput. 9, 2, Article 7 (April 2022), 41 pages.
https://doi.org/10.1145/3512769
All in all, a set of building blocks for various kinds of sorted sets and dictionaries.
I was able to find the following related proposals (there are perhaps more) that are directly solvable by
JoinTree
:JoinTree
and wrapper classes. Also addressed byIterator<>
JoinTree
JoinTree
JoinTree
improves performance in the immutable case, without concerns of copying larger nodes of B+-trees[1] See benchmarks at the linked page.
SortedSet
(RB-tree) andImmutableSortedSet
(AVL tree) have surprisingly slow lookup performance, which I suspect is due to using delegates or comparers for comparisons. Joinable tree is slightly slower thanSortedSet
for insertions and deletions, but the implementation can probably improved by replacing recursions with iteration. The immutable versions are just faster thanImmutableSortedSet
.API Proposal
The main user-facing class would be
MergeableDictionary
, which is a wrapper around AVL joinable tree.Otherwise, everything in https://github.com/zvrba/Pfm/tree/master/Pfm.Collections/JoinTree should be in the API.
To benefit from performance improvements, a similar wrapper class should be provided for sets, but I'm lacking a good name.
API Usage
Alternative Designs
I suspected that the abstract statics in interfaces can be used to enable devirtualization and inlining to enable more efficient code generation, as shown by "Find" benchmarks. "Less-verbose" designs, i.e., using comparers or delegates as in traditional
System.Collections.{Generic,Immutable}
would probably degrade performance.Iterator<TValue>
should also support directFind(TValue value)
andSeek(int index)
, but would need to also be parametrized by additionalTNodeTraits : struct, INodeTraits<>
parameter, which would make declarations more verbose. The features are easy to add.Main design concern is whether to enable arbitrary augmentations of tree nodes with monoidal tags. (E.g., it would allow users to use
JoinTree
as a toolkit to build, say, an interval tree; see Cormen et.al. cited below.) Size and rank are two examples of such tags and are currently maintained by the tree code. Alternative design for node would beNode<TValue, TTag> where TTag : struct, ITagTraits<TTag>
. Such design would place more burden on the end-user implementing sinceTTag
would have to include and maintain both metadata needed by the tree and the user's custom augmentation. [2] Further applications of such design can be found in section 5 of https://arxiv.org/abs/1612.05665Compare
andMerge
methods could be isolated in ownIValueTraits<TValue>
interface (facilitating reuse), but this would require additional generic parameter at points of use of the tree / dictionary.(I believe "existential types (language proposal)" would alleviate concerns about "cumbersome use" if the two above paragraphs were implemented.)
Additional node-based methods such as
Insert(ref Node<TValue>)
andDelete(Node<TValue>)
could be considered for allocation-free moving of nodes between different containers, esp. when combined with use of iterators.Node<>
could be extended with a "transient tag", thus increasing its size, but allowing the tree to eschew cloning of the nodes when the node's "transient tag" matches the tree's "transient tag". (This is how persistent collections in Clojure work.)A general issue that I would like to discuss: binary search trees are a well-studied abstraction and perhaps the standard library should just provide a customizable "tree toolkit" that users can build their own collections upon, instead of "hiding" trees behind pre-baked sorted collections. The proposals cited above witness that the need is there and a standard "collection" will never be able to support all use-cases. On the other hand, when an unsupported use-case does occur, it is sad that users has to roll their own trees from scratch (time-consuming and error-prone), or use a suboptimal solution.
[2] Efficient augmentation of balanced tree is possible when conditions from the following theorem (Cormen et al. "Introduction to algorithms", 2nd ed.) hold: Let
F
be a field that augments a [red-black] treeT
ofn
nodes, and suppose that the contents ofF
for a nodex
can be computed using only the information in nodesx
,x.L
, andx.R
, includingx.L.F
andx.R.F
. Then, we can maintain the values ofF
in all nodes ofT
during insertion and deletion without asymptotically affecting theO(lg n)
performance of these operations.Risks
Objectively, no as no existing classes or interfaces need modification. Otherwise, having multiple sorted set/dictionary implementations could be confusing. The proposed mergeable dictionary class could also function as a fully-fledged sorted concurrent dictionary.
The text was updated successfully, but these errors were encountered: