Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add GetViewBetween method to SortedDictionary #77645

Open
Jcparkyn opened this issue Oct 30, 2022 · 10 comments
Open

[API Proposal]: Add GetViewBetween method to SortedDictionary #77645

Jcparkyn opened this issue Oct 30, 2022 · 10 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections
Milestone

Comments

@Jcparkyn
Copy link

Jcparkyn commented Oct 30, 2022

Background and motivation

This is an intentional duplicate of #26754, which was closed by the author without a resolution.

SortedDictionary doesn't have a efficient way to enumerate through a subset of the elements. All the ways to do this currently require iterating over all elements in the dictionary, which is inefficient if the desired range is much smaller than the whole collection. For users that need this option, the only current solution is to just use a SortedSet<KeyValuePair<TKey, TValue>> directly, instead of using SortedDictionary.

SortedDictionary uses a SortedSet<T> internally, which exposes a GetViewBetween method for this purpose. This proposal would add a GetViewBetween method to SortedDictionary, which would internally use _set.GetViewBetween.

The conversion from SortedSet to SortedDictionary can be done efficiently, provided that we create a new private constructor like so:

private SortedDictionary(TreeSet<KeyValuePair<TKey, TValue>> set)
{
    _set = set;
}

API Proposal

namespace System.Collections.Generic;

public class SortedDictionary<TKey, TValue> : IDictionary<TKey, TValue>, IDictionary, IReadOnlyDictionary<TKey, TValue> where TKey : notnull
{
    public SortedDictionary<TKey, TValue> GetViewBetween(TKey lowerValue, TKey upperValue);
}

Example implementation:

public SortedDictionary<TKey, TValue> GetViewBetween(TKey lowerValue, TKey upperValue)
{
    var setView = _set.GetViewBetween(new(lowerValue, default!), new(upperValue, default!));
    // Note: this conversion (`TreeSubSet` to `TreeSet`) is potentially problematic. There should be no need for a deep copy here,
    // but the existing public constructors don't make it easy to avoid.
    var treeSet = new TreeSet<KeyValuePair<TKey, TValue>>(setView, _set.Comparer);
    return new SortedDictionary<TKey, TValue>(treeSet);
}

API Usage

var dic = new SortedDictionary<int, string>()
{
   { 1, "a" },
   { 2, "b" },
   { 3, "c" },
};
var view = dic.GetViewBetween(1, 2);

foreach (var (key, value) in view)
{
   // Do something
}

var largestValueInRange = view.Values.Max();

// Note: Reverse would currently use the inefficient Linq version, but this could be optimised in future to enumerate the tree backwards.
var backwardKeys = view.Keys.Reverse().ToList();

Some practical use-cases for this could be e.g.:

  • Finding the first item greater than x (dic.GetViewBetween(x, int.MaxValue).First())
  • Finding all string keys beginning with a certain letter (dic.GetViewBetween("a", "b"))
  • Extracting/summing/etc a sub-range of a sparse array (dic.GetViewBetween(1000, 2000).Values.Sum())
    None of these use-cases can currently be achieved without iterating over all of the items (at least until the upper bound), and as far as I'm aware, none of them can be achieved with other standard collection types (except SortedSet<KeyValuePair<Tkey, TValue>>).

Alternative Designs

  1. Instead of exposing a SortedDictionary, we could just expose an IEnumberable like so:
public IEnumerable<KeyValuePair<TKey, TValue>> GetViewBetween(TKey lowerValue, TKey upperValue);

The main advantages of this approach would be:

  • It can be implemented more easily, without needing to convert from SortedSet to TreeSet.
  • A simpler API with fewer moving parts.
  • It may be desirable to prevent mutation on the view.

However, this has a number of disadvantages:

  • It forces users to handle both keys and values. This is not much of a performance concern unless the key/value types are large structs, but is a bit less elegant than the Keys and Values properties that already exist on SortedDictionary.
  • It is not possible for the user to reverse this efficiently. Admittedly this is also the case with SortedDictionary, but in theory that could be added separately in the future, copying the existing implementation for SortedSet.Reverse.
  • It would lose the ability to modify the view or use random access. I don't see much utility here, but since they have already been done for SortedSet it would seem reasonable to keep them here as well.
  • SortedDictionary already has a struct enumerator, returning an IEnumberable would require boxing (I think?). However, the overhead of wrapping the SortedSet.TreeSubSet in a SortedDictionary might negate this.
  1. Instead of exposing a SortedDictionary, we could use the return the SortedSet directly like so (as per the original proposal from SortedDictionary should have GetViewBetween method #26754:
public SortedSet<KeyValuePair<Tkey, TValue>> GetViewBetween(TKey lowerValue, TKey upperValue);

This has a few issues, as mentioned in #26754 (comment), and doesn't seem worthwhile to consider.

  1. It might be beneficial to allow the lower or upper bound to be omitted. SortedSet.GetViewBetween does not provide this directly, even though there is a constructor for TreeSubSet that supports it. Edit: See Add unbounded SortedSet<T>.TreeSubSet API #55510.

Risks

This would not require any breaking changes that I'm aware of, aside from adding a couple of private/internal constructors to make the change possible. One potential consideration here is binary serialization, but I'm not aware of any specific breaking changes here.

As mentioned above, it may be difficult to add a non-copying conversion from TreeSubSet to TreeSet. SortedDictionary needs to store a TreeSet internally for backwards-compatibility with binary serialization.

Exposing a mutable view of the dictionary would require some work to verify that it is not possible to corrupt the dictionary via the view, or vice versa. This also adds some design questions. Quoting from @terrajobst in #26754 (comment):

There is also some behavior questions, which we should answer by how SortedSet works today. Is the behavior sound?

  • What happens if someone adds an item to the view? SortedSet allows this and adds to the parent. Subsequent enumeration will return it.
  • What happens if someone adds an entry that is outside the bounds of the view? SortedSet throws.
  • What happens to the view if someone removes from the original collection? SortedSet reflects that in both view as well as parent.
@Jcparkyn Jcparkyn added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Oct 30, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 30, 2022
@ghost
Copy link

ghost commented Oct 30, 2022

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

This is an intentional duplicate of #26754, which was closed by the author without a resolution.

SortedDictionary doesn't have a efficient way to enumerate through a subset of the elements. All the ways to do this currently require iterating over all elements in the dictionary, which is inefficient if the desired range is much smaller than the whole collection. For users that need this option, the only current solution is to just use a SortedSet<KeyValuePair<TKey, TValue>> directly, instead of using SortedDictionary.

SortedDictionary uses a SortedSet<T> internally, which exposes a GetViewBetween method for this purpose. This proposal would add a GetViewBetween method to SortedDictionary, which would internally use _set.GetViewBetween.

The conversion from SortedSet to SortedDictionary can be done efficiently, provided that we create a new private constructor like so:

private SortedDictionary(TreeSet<KeyValuePair<TKey, TValue>> set)
{
    _set = set;
}

API Proposal

namespace System.Collections.Generic;

public class SortedDictionary<TKey, TValue> : IDictionary<TKey, TValue>, IDictionary, IReadOnlyDictionary<TKey, TValue> where TKey : notnull
{
    public SortedDictionary<TKey, TValue> GetViewBetween(TKey lowerValue, TKey upperValue);
}

Example implementation:

public SortedDictionary<TKey, TValue> GetViewBetween(TKey lowerValue, TKey upperValue)
{
    var setView = _set.GetViewBetween(new(lowerValue, default!), new(upperValue, default!));
    // Note: this conversion (`TreeSubSet` to `TreeSet`) is potentially problematic. There should be no need for a deep copy here,
    // but the existing public constructors don't make it easy to avoid.
    var treeSet = new TreeSet<KeyValuePair<TKey, TValue>>(setView, _set.Comparer);
    return new SortedDictionary<TKey, TValue>(treeSet);
}

API Usage

var dic = new SortedDictionary<int, string>()
{
   { 1, "a" },
   { 2, "b" },
   { 3, "c" },
};
var view = dic.GetViewBetween(1, 2);

foreach (var (key, value) in view)
{
   // Do something
}

var largestValueInRange = view.Values.Max();

// Note: Reverse would currently use the inefficient Linq version, but this could be optimised in future to enumerate the tree backwards.
var backwardKeys = view.Keys.Reverse().ToList();

Alternative Designs

  1. Instead of exposing a SortedDictionary, we could just expose an IEnumberable like so:
public IEnumerable<KeyValuePair<TKey, TValue>> GetViewBetween(TKey lowerValue, TKey upperValue);

The main advantages of this approach would be:

  • It can be implemented more easily, without needing to convert from SortedSet to TreeSet.
  • A simpler API with fewer moving parts.
  • It may be desirable to prevent mutation on the view.

However, this has a number of disadvantages:

  • It forces users to handle both keys and values. This is not much of a performance concern unless the key/value types are large structs, but is a bit less elegant than the Keys and Values properties that already exist on SortedDictionary.
  • It is not possible for the user to reverse this efficiently. Admittedly this is also the case with SortedDictionary, but in theory that could be added separately in the future, copying the existing implementation for SortedSet.Reverse.
  • It would lose the ability to modify the view or use random access. I don't see much utility here, but since they have already been done for SortedSet it would seem reasonable to keep them here as well.
  • SortedDictionary already has a struct enumerator, returning an IEnumberable would require boxing (I think?). However, the overhead of wrapping the SortedSet.TreeSubSet in a SortedDictionary might negate this.
  1. Instead of exposing a SortedDictionary, we could use the return the SortedSet directly like so (as per the original proposal from SortedDictionary should have GetViewBetween method #26754:
public SortedSet<KeyValuePair<Tkey, TValue>> GetViewBetween(TKey lowerValue, TKey upperValue);

This has a few issues, as mentioned in #26754 (comment), and doesn't seem worthwhile to consider.

  1. It might be beneficial to allow the lower or upper bound to be omitted. SortedSet.GetViewBetween does not provide this directly, even though there is a constructor for TreeSubSet that supports it.

Risks

This would not require any breaking changes that I'm aware of, aside from adding a couple of private/internal constructors to make the change possible. One potential consideration here is binary serialization, but I'm not aware of any specific breaking changes here.

As mentioned above, it may be difficult to add a non-copying conversion from TreeSubSet to TreeSet. SortedDictionary needs to store a TreeSet internally for backwards-compatibility with binary serialization.

Exposing a mutable view of the dictionary would require some work to verify that it is not possible to corrupt the dictionary via the view, or vice versa. This also adds some design questions. Quoting from @terrajobst in #26754 (comment):

There is also some behavior questions, which we should answer by how SortedSet works today. Is the behavior sound?

  • What happens if someone adds an item to the view? SortedSet allows this and adds to the parent. Subsequent enumeration will return it.
  • What happens if someone adds an entry that is outside the bounds of the view? SortedSet throws.
  • What happnens to the view if someone removes from the original collection? SortedSet reflects that in both view as well as parent.
Author: Jcparkyn
Assignees: -
Labels:

api-suggestion, area-System.Collections

Milestone: -

@Jcparkyn
Copy link
Author

Can someone who knows more than me about binary serialization let me know whether it would be possible to do the following to get around the TreeSubSet => TreeSet conversion issue:

  • Replace the _set field with a less-derived type (probably just SortedSet), and
  • Keep the TreeSet class around for back-compat with collections that have already been serialized.

Concerns here are:

  • Is this actually backwards-compatible?
  • TreeSet throws on duplicates, but other SortedSets don't.
  • Calls to methods on _set can no longer be de-virtualised.

@Jcparkyn
Copy link
Author

Another related question is whether SortedList should get the same treatment. The benefits here would be much lower, as it is already possible for users to implement this efficiently by doing a binary search to find the start/end indices.

@danmoseley
Copy link
Member

In .NET 8 it may be possible to simply break binary formatting for this type in some cases: https://github.com/dotnet/designs/blob/main/accepted/2020/better-obsoletion/binaryformatter-obsoletion.md

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Dec 20, 2022
@layomia layomia added this to the Future milestone Dec 20, 2022
@smarts
Copy link

smarts commented May 17, 2023

@Jcparkyn I love this proposal! What do you think about expanding it? The underlying SortedSet<T> provides several useful members that are analogous to the following:

  • KeyValuePair<TKey, TValue>? Min { get; }
  • KeyValuePair<TKey, TValue>? Max { get; }
  • IEnumerable<KeyValuePair<TKey, TValue>> Reverse()

These help solve 2 of the examples you gave:

// use the following line instead of: var largestValueInRange = view.Values.Max();
var largestValueInRange = view.Max?.Value;

// use the following line instead of var backwardKeys = view.Keys.Reverse().ToList();
var backwardKeys = view.Reverse().Select(x => x.Keys).ToList();

Min and Max are more about convenience. Without Max you could still use view.Reverse().FirstOrDefault() to access the max key or value efficiently, but it's less convenient because the caller has to remember to check the key for null because FirstOrDefault() will never return null for IEnumerable<KeyValuePair<TKey, TValue>> (because KeyValuePair<, > is not a reference type).

@smarts
Copy link

smarts commented May 17, 2023

You could even add these members to KeyCollection and ValueCollection, given that they are backed by the SortedDictionary<, > itself.

@pawchen
Copy link

pawchen commented Aug 3, 2023

@smarts +1 for adding Min/Max to KeyCollection, created a proposal for that but got redirected here.

@Jcparkyn
Copy link
Author

Jcparkyn commented Aug 7, 2023

I definitely agree with others that Min/Max should be added, regardless of whether this proposal ever makes it through (but both would be nice).

@smarts
Copy link

smarts commented Aug 7, 2023

@pawchen & @Jcparkyn I highly recommend giving a 👍🏼 to the issue/proposal at the top because I believe that's mostly what Microsoft looks at.

@sebastienros
Copy link
Member

Updated API shape based on other comments:

namespace System.Collections.Generic;

public class SortedDictionary<TKey, TValue> : IDictionary<TKey, TValue>, IDictionary, IReadOnlyDictionary<TKey, TValue> where TKey : notnull
{
+    public KeyValuePair<TKey, TValue>? Min { get; }
+    public KeyValuePair<TKey, TValue>? Max { get; }
+    public IEnumerable<KeyValuePair<TKey, TValue>> Reverse()
+    public SortedDictionary<TKey, TValue> GetViewBetween(TKey? lowerValue, TValue? upperValue)
}

GetViewBetween(TKey, TValue) intentionally returns SortedDictionary<TKey, TValue> instead of IEnumerable<KeyValuePair<TKey, TValue>> so we can chain with other of these new members. SortedSet<T> does the same.

sebastienros added a commit to sebastienros/runtime that referenced this issue Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests

6 participants