Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce equality comparers for OpenXmlElement #1476

Merged
merged 18 commits into from
Jul 9, 2023
Merged

Introduce equality comparers for OpenXmlElement #1476

merged 18 commits into from
Jul 9, 2023

Conversation

PhDuck
Copy link
Contributor

@PhDuck PhDuck commented Jun 29, 2023

Problem:
The current way to determine equality for two OpenXmlElements seems to
be x.OuterXml.Equals(y.OuterXml) (based on StackOverflow top answer).

This performs horribly due to allocation of strings and XmlWriter.

Solution:
This PR introduces OpenXmlElementEqualityComparer which supports equality determination.
Furthermore, the definition of equality is defined by OpenXmlElementEqualityOptions which allows for controlling how equality should be defined.

Benchmark results comparing to x.OuterXml.Equals(y.OuterXml):

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19045.3086)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.304
  [Host] : .NET 7.0.7 (7.0.723.27404), X64 RyuJIT AVX2

Job=InProcess  Toolchain=InProcessEmitToolchain

|                       Method |              Mean |           Error |          StdDev |     Gen0 |     Gen1 |     Gen2 | Allocated |
|----------------------------- |------------------:|----------------:|----------------:|---------:|---------:|---------:|----------:|
|         OuterXmlLargeElement | 22,469,563.462 ns | 156,783.6460 ns | 130,921.4452 ns | 437.5000 | 281.2500 | 218.7500 | 7370176 B |
|           EqualsLargeElement |  9,235,127.232 ns | 118,532.7404 ns | 105,076.1870 ns |        - |        - |        - |      10 B |
| EqualsLargeElementSkipPrefix |  4,165,810.498 ns |  78,261.1757 ns |  76,862.9390 ns |        - |        - |        - |       5 B |
|             UnparsedOuterXml |          3.296 ns |       0.0140 ns |       0.0131 ns |        - |        - |        - |         - |
|               UnparsedEquals |          4.721 ns |       0.0149 ns |       0.0125 ns |        - |        - |        - |         - |
|                SmallOuterXml |      3,616.694 ns |      52.6133 ns |      43.9345 ns |   1.7662 |   0.0877 |        - |   29584 B |
|                  SmallEquals |        705.613 ns |       3.4249 ns |       3.2037 ns |   0.0010 |        - |        - |      24 B |
|        SmallEqualsSkipPrefix |        376.926 ns |       3.3287 ns |       2.9508 ns |   0.0014 |        - |        - |      24 B |

Mads Gram added 2 commits June 29, 2023 14:43
Problem:
The current way to determine equality for two OpenXmlElements seems to
be x.OuterXml.Equals(y.OuterXml) (based on StackOverflow top answer).

This performs horribly due to allocation of strings and XmlWriter.

Solution:
Make OpenXmlElement support quality via the IEquatable interface.

Notes:
For unparsed OpenXmlElement the order of the given XML element matters, this insn't the case for parsed ones. This both happens for child elements order and attributes.
The order of ExtendedAttributes order always matters.
@PhDuck
Copy link
Contributor Author

PhDuck commented Jun 29, 2023

@twsouthwick Your eyes on this PR would be much appreciated.

There is a few things that I would like some pointer with:

  1. Ordering:
    1.1. For children it seems order is not defined, so order is non-stable.
    1.2. For attributes it seems their order is defined, so order is stable.
    1.3 For extended attributes order is not defined, so ordering is non-stable.
    1.4 For unparsed none of the above counts, since it depends fully on the inputs textual order.
  2. Elements of type OpenXmlLeafTextElement seems to need special behaviour for comparing their text value. Other places these values seem to be placed as an attribute. I don't quite know why this is.
  3. Attributes seems to always be of type OpenXmlComparableSimpleValue, so they support Equals. Can someone please confirm this.
  4. Prefixes are quite expensive to lookup, so can we skip it and just compare Namespace.Uri, if so we can save ~40% further,

@twsouthwick
Copy link
Member

@PhDuck Great questions. I'll answer them first, then a few thoughts at the end:

  1. Ordering:
    1.1. For children it seems order is not defined, so order is non-stable.

    This is actually not true, generally. Order of children are defined by their schema. However, for comparing, you should just iterate through the children and use the order that is there

    1.2. For attributes it seems their order is defined, so order is stable.

    @tomjebo is this the case? We may want to have this as an option

    1.3 For extended attributes order is not defined, so ordering is non-stable.

    We may want to include/exclude these

    1.4 For unparsed none of the above counts, since it depends fully on the inputs textual order.

  2. Elements of type OpenXmlLeafTextElement seems to need special behaviour for comparing their text value. Other places these values seem to be placed as an attribute. I don't quite know why this is.

  3. Attributes seems to always be of type OpenXmlComparableSimpleValue, so they support Equals. Can someone please confirm this.

    Attributes are always OpenXmlSimpleType which has IEquatable I believe

  4. Prefixes are quite expensive to lookup, so can we skip it and just compare Namespace.Uri, if so we can save ~40% further,

    Maybe have it as an option

With the points you're making, I think we may not want to implement equality directly on the element at this point, but rather provide helpers to create equality comparers that compare things people care about (you've pointed out questions that could be things people opt in/out of)

I'm thinking something like this:

public static class OpenXmlElementComparers
{
	public static IEqualityComparer<OpenXmlElement> Default { get; }
	public static IEqualityComparer<OpenXmlElement> Create(OpenXmlElementEqualityOptions options);
}

public sealed class OpenXmlElementEqualityOptions
{
	// Include options
}

@tomjebo
Copy link
Collaborator

tomjebo commented Jun 29, 2023

  1. 1.2. For attributes it seems their order is defined, so order is stable.

    @tomjebo is this the case? We may want to have this as an option

According to the XML specification order of attributes is not significant and therefore it shouldn't be for us as well. If we see a problem with this, definitely we want to investigate.

@twsouthwick
Copy link
Member

twsouthwick commented Jun 30, 2023

@PhDuck we've added GH action tests this week so if you merge main you'll be able to run tests

@PhDuck PhDuck changed the title OpenXmlElement implements IEquatable Introduce equality comparers for OpenXmlElement Jul 3, 2023
Copy link
Member

@twsouthwick twsouthwick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Some of the changes won't work well with the multitargeting (I commented on a few things) - I'll kick the build off so we can see the results

@twsouthwick
Copy link
Member

twsouthwick commented Jul 3, 2023

Can you merge from main so the tests will run?

Edit: nm they're running

@github-actions
Copy link

github-actions bot commented Jul 3, 2023

Test Results

     108 files       108 suites   1h 19m 31s ⏱️
  2 007 tests   2 004 ✔️   3 💤 0
47 934 runs  47 880 ✔️ 54 💤 0

Results for commit 267e789.

♻️ This comment has been updated with latest results.

@PhDuck PhDuck marked this pull request as ready for review July 3, 2023 21:40
@twsouthwick
Copy link
Member

@PhDuck I pushed a change that allows using the built in hashcode shim - it's in a different namespace

@twsouthwick twsouthwick merged commit 31289b6 into dotnet:main Jul 9, 2023
@twsouthwick
Copy link
Member

Thanks @PhDuck! This is cool feature that will be available in v3.0 when it's released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants