Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Base64url encoding/decoding #102364

Merged
merged 40 commits into from
Jun 14, 2024
Merged

Add Base64url encoding/decoding #102364

merged 40 commits into from
Jun 14, 2024

Conversation

buyaa-n
Copy link
Contributor

@buyaa-n buyaa-n commented May 17, 2024

  • Base64Url encoding doesn't append padding, therefore encoded byte count differs from Base64 encoding:

    Source byte count Base64 encoded byte count Base64Url encoded byte count
    1 4 2
    2 4 3
    3 4 4
    4 8 6
    5 8 7
    6 8 8
  • Base64Url decoding ignore whitespace and padding, therefore decodable byte count differs from Base64 decoding, the exact decoding result also depend on isFinalBlock and if padding involved

    Source byte count Base64 decoded byte count Base64Url decoded byte count
    1 max 0 max 0
    2 max 0 max 1
    3 max 0 max 2
    4 max 3 max 3
    5 max 3 max 3
    6 max 3 max 4
    7 max 3 max 5
    8 max 6 max 6

Approved API shape:

namespace System.Buffers.Text;

public static class Base64Url
{
    public static int GetMaxDecodedLength(int base64Length);
    public static int GetEncodedLength(int bytesLength);

    public static OperationStatus EncodeToUtf8(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true);
    public static int EncodeToUtf8(ReadOnlySpan<byte> source, Span<byte> destination);
    public static bool TryEncodeToUtf8(ReadOnlySpan<byte> source, Span<byte> destination, out int charsWritten);
    public static byte[] EncodeToUtf8(ReadOnlySpan<byte> source);

    public static OperationStatus EncodeToChars(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool isFinalBlock = true);
    public static int EncodeToChars(ReadOnlySpan<byte> source, Span<char> destination);
    public static bool TryEncodeToChars(ReadOnlySpan<byte> source, Span<char> destination, out int charsWritten);
    public static char[] EncodeToChars(ReadOnlySpan<byte> source);
    public static string EncodeToString(ReadOnlySpan<byte> source);

    public static bool TryEncodeToUtf8InPlace(Span<byte> buffer, int dataLength, out int bytesWritten);

    public static OperationStatus DecodeFromUtf8(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true);
    public static int DecodeFromUtf8(ReadOnlySpan<byte> source, Span<byte> destination);
    public static bool TryDecodeFromUtf8(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
    public static byte[] DecodeFromUtf8(ReadOnlySpan<byte> source);
    public static OperationStatus DecodeFromChars(ReadOnlySpan<char> source, Span<byte> destination, out int charsConsumed, out int bytesWritten, bool isFinalBlock = true);
    public static int DecodeFromChars(ReadOnlySpan<char> source, Span<byte> destination);
    public static bool TryDecodeFromChars(ReadOnlySpan<char> source, Span<byte> destination, out int bytesWritten);
    public static byte[] DecodeFromChars(ReadOnlySpan<char> source);

    public static int DecodeFromUtf8InPlace(Span<byte> buffer);

    public static bool IsValid(ReadOnlySpan<char> base64UrlText);
    public static bool IsValid(ReadOnlySpan<char> base64UrlText, out int decodedLength);
    public static bool IsValid(ReadOnlySpan<byte> utf8Base64UrlText);
    public static bool IsValid(ReadOnlySpan<byte> utf8Base64UrlText, out int decodedLength);
}

Draft PR until below list is completed, for now mainly for checking perf, run tests on different CI legs, and get early feedback.

  • Handle '%' in decoding as a valid padding character for Base64Url
  • Doc needs improvement
  • A few APIs missing unit tests
  • Fix failures on ARM CI legs

Fixes #1658

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed over it...
I saw your comment, but for checking perf some of my comments may help (a bit).

@buyaa-n
Copy link
Contributor Author

buyaa-n commented May 17, 2024

/benchmark

@buyaa-n
Copy link
Contributor Author

buyaa-n commented May 17, 2024

Just skimmed over it... I saw your comment, but for checking perf some of my comments may help (a bit).

Thank you, sure perf related feedbacks appreciated, I updated the description, any feedbacks are welcome.

Copy link

pr-benchmarks bot commented May 17, 2024

Crank Pull Request Bot

/benchmark <benchmark[,...]> <profile[,...]> <component,[...]> <arguments>

Benchmarks:

  • micro: .NET Performance micro benchmarks (set filter by adding --variable filter=...; by default filter=*LinqBenchmarks*)
  • plaintext: TechEmpower Plaintext Scenario - ASP.NET Platform implementation
  • json: TechEmpower JSON Scenario - ASP.NET Platform implementation
  • fortunes: TechEmpower Fortunes Scenario - ASP.NET Platform implementation
  • fortunes_ef: TechEmpower Fortunes Scenario with EF Core - ASP.NET Platform implementation
  • httpclient: HttpClient Benchmark (change HTTP version by adding e.g. --variable httpVersion=3.0; change response size by adding e.g. --variable responseSize=256; default: HTTP/1.1 GET 8K)

Profiles:

  • aspnet-citrine-lin: Intel/Linux 28 Cores
  • aspnet-perf-lin: Intel/Linux 12 Cores
  • aspnet-citrine-amd: Amd/Linux 48 Cores

Components:

  • runtime
  • libs

Arguments: any additional arguments to pass through to crank, e.g. --variable name=value

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

padding++;
}

if (TBase64Decoder.IsValidPadding(Unsafe.Subtract(ref ptrToLastElement, 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What guarantees this is in bounds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called with a buffer that has size of 4

const int BlockSize = 4;
Span<byte> buffer = stackalloc byte[BlockSize];

int paddingCount = GetPaddingCount<TBase64Decoder>(ref buffer[^1]);


if (src == srcEnd)
goto DoneExit;
}

end = srcMax - 64;
end = srcMax - 32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change from 64 to 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was 32 originally, but unintentionally changed with this PR

The method comment also mentions it require 32 byte

private static unsafe void Avx2Encode<TBase64Encoder, T>(ref byte* srcBytes, ref T* destBytes, byte* srcEnd, int sourceLength, int destLength, byte* srcStart, T* destStart)
where TBase64Encoder : IBase64Encoder<T>
where T : unmanaged
{
// If we have AVX2 support, pick off 24 bytes at a time for as long as we can.
// But because we read 32 bytes at a time, ensure we have enough room to do a
// full 32-byte read without segfaulting.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

{
const int BlockSize = 4;
int BlockSize = Math.Min(source.Length - (int)sourceIndex, 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would anything bad happen if this were left at a const 4?
This not being const would penalize the non-url path as well a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Base64Url when the source is not multiple of 4 we need to adjust the buffer size accordingly, the BlockSize value used to fill the whitespace with padding (row 653) so that it could decoded correctly (in case remaining bytes were decodable or valid)

@MihaZupan
Copy link
Member

@MihuBot fuzz Base64

@lewing
Copy link
Member

lewing commented Jun 19, 2024

also showing regressions in dotnet/perf-autofiling-issues#36512

and improvements dotnet/perf-autofiling-issues#36643

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Base 64 URL
8 participants