-
-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose and conserve the color palette for indexed png images. #2485
Conversation
@SixLabors/core This one next please 🙏 |
for (int i = 0; i < alpha.Length; i++) | ||
{ | ||
ref Color color = ref colorTable[i]; | ||
color = color.WithAlpha(alpha[i] / 255F); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If perf is important here, then
color = color.WithAlpha(alpha[i] / 255F); | |
color = color.WithAlpha(alpha[i] * (1F / 255F)); |
will produce a mul instead of div.
Even if perf isn't critical here I think that I'd change the code, as
- it's almost as readable
- it's not only about the perf of this piece of code, but rather for the whole system, as the cpu-ports can be utilized better when there's no division clogging it
Note: it's Roslyn to blame here for not being able to emit 1 / 255F
automatically -- though the JIT could do that optimization too, but it has more stringent timing constraints...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would not be a valid optimization for either compiler to make, as the results are not the same (due to 1/255f not being precisely representable in float). See https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEUCuA7AHwAEAmARgFgAoIgBgAIiyA6AGQEs8BHAbmqIDMjEvQDC9AN7V6M4SKlVZS+gDNoMAIZgAFvQAUnDPQBu9TvQCieHAFsYUDcAA2MZgCUNeAOYw9tNPQkAKxBAJShktLK0ewq+qYA9IEhcQCEALwm9ABU+mQJwUEq4VHRZYxkAJx6ACQARBLGAL4gkonJRU30GW05eQUpoU11oXyKZU2l9JNUTUA===
For that reason, unless it's a hot path, this optimization loses precision for no measurable benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the results are not the same
Yep, but the error is in the region of 1e-8 or less (demo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that value range, yes, but the magnitude of the error is relative to the magnitude of the value -- that's how floating point works. Whether the precision loss is acceptable in any given case is a question of the trade-off between performance (~12 cycles for divss
vs ~4 for mulss
) and precision.
I'm not familiar enough with the code to know whether that trade-off is worthwhile here (though it looks questionable). Mostly, I was commenting on your suggestion that Roslyn and RyuJIT were somehow missing an optimization one of them could have made automatically, which is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is incorrect
You are right (and I have to apologize here by Roslyn).
I had a look what GCC does (before writing the first comment on this) and it emitted the mul
-- but I had -Ofast
compiler switch set which disregards strict standards compliance (as I re-read just now). With -O{2,3}
it's a div
.
Thanks for the heads up!
@@ -875,7 +875,7 @@ private void WriteGammaChunk(Stream stream) | |||
// 4-byte unsigned integer of gamma * 100,000. | |||
uint gammaValue = (uint)(this.gamma * 100_000F); | |||
|
|||
BinaryPrimitives.WriteUInt32BigEndian(this.chunkDataBuffer.Span.Slice(0, 4), gammaValue); | |||
BinaryPrimitives.WriteUInt32BigEndian(this.chunkDataBuffer.Span[..4], gammaValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: the Slice
here was IIRC intentional, as it produces better code than the slice. There are issues in Roslyn and runtime to care about that, but for .NET 8 these aren't resolved (when I'm not mistaken).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just taking the hit here. One day Roslyn will catch up, but we need greater consistency to ease the (massive) maintenance burden. I'm hoping come v4/.NET8 with the SIMD migration improvements I have planned we won't notice codegen issues.
@@ -1175,7 +1177,7 @@ private void WriteChunk(Stream stream, PngChunkType type, Span<byte> data, int o | |||
|
|||
stream.Write(buffer); | |||
|
|||
uint crc = Crc32.Calculate(buffer.Slice(4)); // Write the type buffer | |||
uint crc = Crc32.Calculate(buffer[4..]); // Write the type buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@antonfirsov The changes here are based upon the changes made to the gif format API. If you have a chance, I'd love to get your feedback please. |
@SixLabors/core Can I get a review here please. |
@JimBobSquarePants will do it by EOW if that's good enough. |
Thanks! Appreciate it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions, but mostly LGTM.
ReadOnlySpan<Rgb24> rgbTable = MemoryMarshal.Cast<byte, Rgb24>(palette); | ||
for (int i = 0; i < colorTable.Length; i++) | ||
{ | ||
colorTable[i] = new Color(rgbTable[i]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I see several loops like this one in codecs now. We can maybe define a public static Color.FromPixel<TPixel>(Span<TPixel>, Span<Color>)
analogous to the bulk ToPixel<TPixel>(...)
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without knowing the component precision of the TPixel
we don't know which boxed pixel to assign to. We could do it in v4 though if we added additional information to PixelTypeInfo
#2534 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah this indeed goes well beyond this PR, nevermind then.
Prerequisites
Description
This builds upon #2455 and should be merged after.
Changes the following in
PngMetadata
:ReadonlyMemory<Color> ColorTable
property.TransparentRgb24
,TransparentRgb48
,TransparentL8
and,TransparentL16
into a new nullableTransParentColor Color?
property.HasTransparency
since we now know that from the palette or transparent color.On index Png encode, as with Gif if a palette is found in the metadata the quantizer will use that instead.