Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performant way to compare OpenXmlElement #1454

Closed
PhDuck opened this issue Jun 26, 2023 · 1 comment
Closed

Performant way to compare OpenXmlElement #1454

PhDuck opened this issue Jun 26, 2023 · 1 comment
Assignees
Milestone

Comments

@PhDuck
Copy link
Contributor

PhDuck commented Jun 26, 2023

When comparing OpenXmlElement objects, there isn't an obvious performant way to do it.
The current top solution from StackOverflow does:

return font.OuterXml.Equals(existingFont.OuterXml);

Performance wise this performs very poorly, due to two reasons:

  1. Lots of temporary allocations:
    1.1. StringWriter
    1.2. XmlDomTextWriter (which allocates a 6000 char buffer)
    1.3 Final string
  2. String comparison instead of comparing primitive types.

Is your feature request related to a problem? Please describe.
Due to the amount of allocations, we are seeing tons of garbage collection happening when creating Excel sheets in our service.
When creating the sheets we have attach styles for the sheet, but to avoid re-adding styles, we compare them to previously added styles.

Describe the solution you'd like
Would be nice if OpenXmlElement implemented IEquatable.
However, due to potential wishes for controlling ordering behaviour I could imagine someone wanting a more complex API, but I'm okay with any of them.

Describe alternatives you've considered
Tried to implement a faster comparison based on writing to a stream and comparing the underlying bytes instead.

using (var ms0 = new MemoryStream(256))
using (var ms1 = new MemoryStream(256))
using (XmlWriter writer = XmlWriter.Create(ms0, xwSettings))
using (XmlWriter writer2 = XmlWriter.Create(ms0, xwSettings))
{
    font.WriteTo(writer);
    font.WriteTo(writer2);

    writer.Flush();
    writer2.Flush();

    if (ms0.Position == ms1.Position)
    {
        byte[] buf1 = ms0.GetBuffer();
        byte[] buf2 = ms1.GetBuffer();

        buf1.AsSpan(0, (int)ms0.Position).SequenceEqual(ms1.GetBuffer().AsSpan(0, (int)ms1.Position));
    }
}

I have also tried to do the above, but reusing the XmlWriter and MemoryStream, which reduces the allocations and time significantly. However StackOverflow warns of reuse.
However a slightly naive attempt to recurse the children and attributes show there is more to gain. It could be even faster if the GetAttributes didn't allocate a new list.

Additional context
Benchmarks:

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.3086/22H2/2022Update)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.304
  [Host] : .NET 6.0.18 (6.0.1823.26907), X64 RyuJIT AVX2

Server=True  

|          Method | N |       Mean |     Error |   StdDev |   Gen0 |   Gen1 | Allocated |
|---------------- |-- |-----------:|----------:|---------:|-------:|-------:|----------:|
|              MS | 1 | 4,455.6 ns |  57.62 ns | 51.08 ns | 0.9537 | 0.0381 |   16024 B |
|           Reuse | 1 | 2,732.4 ns |  12.43 ns | 11.02 ns | 0.0076 |      - |     144 B |
|        OuterXML | 1 | 5,784.4 ns | 100.72 ns | 98.92 ns | 1.7548 | 0.0839 |   29424 B |
| RecursiveManual | 1 |   974.1 ns |   3.62 ns |  3.21 ns | 0.0648 |      - |    1104 B |
@twsouthwick
Copy link
Member

Fixed by #1476

@twsouthwick twsouthwick added this to the v3.0 milestone Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants