Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blazorserver Byte Array Interop Support #32259

Closed
wants to merge 19 commits into from

Conversation

TanayParikh
Copy link
Contributor

@TanayParikh TanayParikh commented Apr 29, 2021

This is a first draft of the blazor server byte array interop support, would appreciate any feedback.

  • TODO; ByteArrayJsonConverter tests

Part of: #21877

@TanayParikh TanayParikh requested review from a team, javiercn and SteveSandersonMS April 29, 2021 02:42
@javiercn javiercn added area-blazor Includes: Blazor, Razor Components feature-blazor-server labels Apr 29, 2021
@TanayParikh TanayParikh marked this pull request as ready for review April 29, 2021 16:28
@TanayParikh
Copy link
Contributor Author

Note; the CI will continue to fail due to WASM changes required. I'm planning on handling WASM with another PR targeting this branch. In the meantime this branch/PR will serve as the blazorserver implementation.

@@ -362,7 +362,7 @@ public async Task BeginInvokeDotNetFromJS(string callId, string assemblyName, st

// EndInvokeJSFromDotNet is used in a fire-and-forget context, so it's responsible for its own
// error handling.
public async Task EndInvokeJSFromDotNet(long asyncCall, bool succeeded, string arguments)
public async Task EndInvokeJSFromDotNet(long callId, bool succeeded, string resultOrError, byte[][]? byteArrays)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, these are better names. Thanks for cleaning this up!

public static string? Invoke(JSRuntime jsRuntime, in DotNetInvocationInfo invocationInfo, string argsJson)
/// <param name="byteArrays">Byte array data extracted from the arguments for direct transfer.</param>
/// <returns>A tuple containing the JSON representation of the return value, or null, along with the extracted byte arrays, or null.</returns>
public static (string?, byte[][]?) Invoke(JSRuntime jsRuntime, in DotNetInvocationInfo invocationInfo, string argsJson, byte[][]? byteArrays)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be preferable to define a readonly struct for this return type rather than the tuple? Just in case, in the future, we want to add even more things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we don't use tuples in public APIs.

@SteveSandersonMS what was the reason for this to be public?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the reason for this to be public?

Layering reasons. JS interop is an independent layer from any of the hosting platforms, and any hosting platform that needs to deliver an inbound synchronous JS interop call would do so by calling this. Currently that's only WebAssembly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var result = JsonSerializer.Deserialize(ref jsonReader, resultType, JsonSerializerOptions);
ByteArraysToDeserialize = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern with ByteArraysToDeserialize might be the first place where JSRuntime stops being thread-safe. That might be absolutely fine with our current use cases, but it would be good to:

  1. Check that our current use cases don't do concurrent invocations
  2. Add a check here that would helps us detect quickly in the future if we change things and break the assumption. For example, just before line 229, you could check if ByteArraysToDeserialize was non-null and if so, throw. I guess there could be some similar flag to verify access around the logic in SerializeArgs, if not an actual lock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's more than just thread-safety - it's also about nested invocations. I'm not sure whether it's possible in our current use cases, but it's imaginable that someone might have a JSON converter that itself performs JS interop.

If nested invocations is a scenario (not saying it is), then something like StackObjectPool<T> from the Components project might help with minimising the number of lists that get allocated, while retaining supported for nested usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thoughts. I'm not concerned about thread safety since all these operations must happen inside the sync context. Nested invocations could be a thing, however I would suggest we detect that case and throw an exception as "unsupported" since it is possible but very unlikely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not concerned about thread safety since all these operations must happen inside the sync context

That may be true in our current uses but isn't part of the contract as far as the JS interop layer is concerned. JS interop is technically a separate thing and doesn't have a concept of a sync context.

We can define JSRuntime as being not thread-safe though, and yes it would be great to detect invalid usage and throw as this would simplify detecting any future issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 9995e90 & 8aeef7d should resolve these concerns.

Regarding thread-safety/nested invocation: These added checks should be able to detect when we're trying to interrupt a serialization already in progress (and guard against it via semaphores which must be acquired prior to a (de)serialization request can be completed).

Please let me know if there are still additional concerns.

@SteveSandersonMS
Copy link
Member

This looks really promising. Thanks for tidying up the existing code too!

/// <summary>
/// Contains the combined byte array(s) being serialized.
/// </summary>
internal readonly List<byte[]> ByteArraysToSerialize = new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to rethink this a bit.

The problem here is that if I serialize 10K byte arrays this list will grow up to 10K buckets and won't shrink when you call Clear afterwards.

It might be better to implement a custom type for this, and use an array pool internally to track the byte arrays.

Copy link
Member

@SteveSandersonMS SteveSandersonMS May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ArrayBuilder<T> class in the Shared sources might be a convenient fit. It grows by renting increasingly large buffers from an ArrayPool and when you call Clear it returns the current buffer to the pool. So the pooling concerns would be handled implicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, cdba736 should handle this 😄

Comment on lines 172 to 179
protected internal (string, byte[][]?) SerializeArgs(object? args)
{
ByteArraysToSerialize.Clear();
var serializedArgs = JsonSerializer.Serialize(args, JsonSerializerOptions);
var byteArrays = ByteArraysToSerialize.ToArray();

return (serializedArgs, byteArrays);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more work and lead to a bit of duplication, however, it would be great if we could push this method down to the implementation classes. The reason is that this avoid adding extra public API which might make harder to change things in the future.

Copy link
Member

@SteveSandersonMS SteveSandersonMS May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would flip this around and, if possible, push more of the serialization into the underlying JS interop layer. The intent from the beginning was to make the serialization a non-negotiable part of the underlying layer that particular platform implementations can't effect. That's why we pass argsJson (pre-serialized) to methods like BeginInvokeJS. It means that when we, or someone else, implements support for JS interop on a new platform, they only have to think about the physical transport of the basic string data type (or, in the future, string plus a collection of byte arrays) and don't reinvent any of the serialization semantics and risk any differences across platforms. For example, platforms can't change the set of JSON formatters, as it's intended to evolve over time.

There is a gap in the way this is defined today, though. Unintentional as far as I know. Our EndInvokeDotNet method receives DotNetInvocationResult which in turn has a Result which is not pre-serialized. If we were to follow the pattern consistently, DotNetInvocationResult.Result should be something like DotNetInvocationResult.ResultJson so that each platform implementation isn't involved in the serialization.

Since we're in the middle of making breaking changes to the API contract here, we could take the opportunity to fix this by changing DotNetInvocationResult to have both ResultJson and ResultByteArrays which are both opaque to the per-platform implementations (they just have to transport it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this to be more centered around the JSONConverter. Please let me know whether this is still a concern.

Copy link
Member

@javiercn javiercn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great!

Here are some additional thoughts:

  • We need to figure out what we do with "nested" calls.
    • My suggestion would be that we prevent against those and we throw for the time being.
    • It is impossible for customers today to customize JSON serialization in a way that they can perform JS interop calls and that it is
      thread safe, since they can't add a converter to the JSON options.
  • We can't just use a list to hold on to the byte arrays in the JSRuntime instance since that can grow and won't shrink.
    • We need some type of "scope" that begins right before the call to serialize arguments and "finishes" right after the arguments have been serialized.
    • I think we'll be better served if we avoid using byte[][] in the signatures and we replace it with an opaque struct we know how to work with but that doesn't offer a public API.
  • Ideally, it would be nice if we can push this down to our implementations and have it not be part of the interop contract.
    • Have a "separate" manager for handling the lifetime of the byte arrays
    • Create the manager within the circuit host.
    • Create the converter and have it receive "the manager".
    • Add the JsonConverter to the JsonSerializationOptions that the JS runtime uses (it's available for derived classes).
    • On each call site (within circuithost) "create a new scope" when you are about to process a JS interop call.
    • Dispose of the scope after serialization/deserialization has completed.

@@ -111,7 +111,7 @@ public class InvokeArgs
public string? ArgsJson { get; set; }
}

protected override void BeginInvokeJS(long asyncHandle, string identifier, string? argsJson, JSCallResultType resultType, long targetInstanceId)
protected override void BeginInvokeJS(long asyncHandle, string identifier, string? argsJson, byte[][]? byteArrays, JSCallResultType resultType, long targetInstanceId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere that we have a pair like argsJson and byteArrays, it would be great to rename the latter to something like argsByteArrays to clarify that it's part of the same logical concept (the args).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though since we're doing breaking changes, we might even want to define something like:

public readonly struct InvocationArgs
{
    public readonly string JsonData { get; }
    public readonly byte[][] BinaryData { get; }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if (type == typeof(byte[][]))
{
var length = reader.ReadArrayHeader();
var result = new byte[length][];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is bad, length comes from user provided input, so you cannot pre-allocate the array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated to utilize a list for the result.

@TanayParikh
Copy link
Contributor Author

TanayParikh commented May 24, 2021

@javiercn

  • We need to figure out what we do with "nested" calls.

    • My suggestion would be that we prevent against those and we throw for the time being.
    • It is impossible for customers today to customize JSON serialization in a way that they can perform JS interop calls and that it is
      thread safe, since they can't add a converter to the JSON options.

#32259 (comment)

  • We can't just use a list to hold on to the byte arrays in the JSRuntime instance since that can grow and won't shrink.

    • We need some type of "scope" that begins right before the call to serialize arguments and "finishes" right after the arguments have been serialized.
    • I think we'll be better served if we avoid using byte[][] in the signatures and we replace it with an opaque struct we know how to work with but that doesn't offer a public API.
  • Ideally, it would be nice if we can push this down to our implementations and have it not be part of the interop contract.

    • Have a "separate" manager for handling the lifetime of the byte arrays
    • Create the manager within the circuit host.
    • Create the converter and have it receive "the manager".
    • Add the JsonConverter to the JsonSerializationOptions that the JS runtime uses (it's available for derived classes).
    • On each call site (within circuithost) "create a new scope" when you are about to process a JS interop call.
    • Dispose of the scope after serialization/deserialization has completed.

I've refactored this up quite a bit and would appreciate a second look. I'm a bit hesitant to take this approach due to the (significant) added complexity we may be introducing. The changes I've made to add guards for nested invocations, semaphores to avoid thread safety issues and utilizing the ArrayBuilder to be conscious of large transfers should all hopefully mitigate these concerns. Please let me know if you want to discuss this offline.

{
result.Add(reader.ReadBytes().GetValueOrDefault().ToArray());
}
return result.ToArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice this logic is doing all the allocations for the byte arrays up front. That might be the only possible way to do it, but I'm asking to be sure.

Reason

There's no maximum number of entries in this byte[][], other than the constraint imposed by the SignalR max message size. So as it stands, if the SignalR max message size was (say) 32KB, and if MessagePack could represent an empty array in (say) 4 bytes, then this logic could have the server perform 8000-ish heap allocations on each incoming message, whether or not there even is any JS invokable endpoint that accepts binary data.

I suppose that's not much different from a JSON payload that contains 8000 very short distinct strings, so maybe it's nothing to worry about.

Possible alternative

If this logic returned a List<ReadOnlySequence<byte>> (or similar) instead, would that avoid the per-entry allocations? I guess it depends on how MessagePack's ReadBytes is implemented. But it we were able to do this and pass it through to the JSON deserializer, then the JSON deserializer would only need to convert the ReadOnlySequence<byte> into a byte[] in the case where it matches an actual byte[] property declared on a .NET type. So a mischief-maker wouldn't be able to force any more allocations than the target data structure declares (except if the target data structure contains something like a List).

I know it really might not make any difference, given that people will sometimes declare List-type structures on their target types. But if it's no more difficult to delay the allocations instead of doing them up front, it's worth considering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the suggestion below on a different transport mechanism would make this concern obsolete.

@@ -3,7 +3,7 @@
export module DotNet {
(window as any).DotNet = DotNet; // Ensure reachable from anywhere

export type JsonReviver = ((key: any, value: any) => any);
export type JsonReviver = ((key: any, value: any, byteArrays: Uint8Array[] | null) => any);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible it would be preferable not to expose byteArrays to externally-registered revivers, since it's not intended to be part of the public API surface.

To do this it will probably be necessary to tweak the parseJsonWithRevivers logic so that it calls our own built-in reviver first in a hardcoded way with 3 params, then uses the reduce logic with 2 params for all externally-registered ones.

return json ? JSON.parse(json, (key, initialValue) => {
// Invoke each reviver in order, passing the output from the previous reviver,
// so that each one gets a chance to transform the value
return jsonRevivers.reduce(
(latestValue, reviver) => reviver(key, latestValue),
(latestValue, reviver) => reviver(key, latestValue, byteArrays === undefined ? null : byteArrays),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to convert undefined to null here since our logic can treat undefined the same as null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I wouldn't bother about such fine-tuning but JSON revivers are called for every single object nested in the whole graph on every JS interop call, so they run the risk of being a bottleneck.

Comment on lines +243 to +249
ArgsJson: string | null;
ByteArrays: Uint8Array[] | null;

constructor(argsJson: string | null, byteArrays: Uint8Array[] | null) {
this.ArgsJson = argsJson;
this.ByteArrays = byteArrays;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ArgsJson: string | null;
ByteArrays: Uint8Array[] | null;
constructor(argsJson: string | null, byteArrays: Uint8Array[] | null) {
this.ArgsJson = argsJson;
this.ByteArrays = byteArrays;
}
argsJson: string | null;
byteArrays: Uint8Array[] | null;
constructor(argsJson: string | null, byteArrays: Uint8Array[] | null) {
this.argsJson = argsJson;
this.byteArrays = byteArrays;
}

@@ -244,9 +260,10 @@ export module DotNet {
* @param methodIdentifier The identifier of the method to invoke. The method must have a [JSInvokable] attribute specifying this identifier.
* @param dotNetObjectId If given, the call will be to an instance method on the specified DotNetObject. Pass null or undefined to call static methods.
* @param argsJson JSON representation of arguments to pass to the method.
* @returns JSON representation of the result of the invocation.
* @param byteArrays Byte array data extracted from the arguments for direct transfer.
* @returns SerializedArgs containing the string JSON args along with the extracted byte arrays representation of the result of the invocation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be OK to tweak the naming here? Since this is a return value, calling it "args" is quite confusing. It would be great to declare another type like SerializedResult, even if structurally it's the same as SerializedArgs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with invokeJSFromDotNet below.

Comment on lines -305 to +329
return result === null || result === undefined
const returnedByteArrays : Uint8Array[] = [];
var serializedJson = (result === null || result === undefined)
? null
: JSON.stringify(result, argReplacer);
: JSON.stringify(result, (key, value) => argReplacer(key, value, returnedByteArrays)); // TODO; confirm this works for blazor wasm
return new SerializedArgs(serializedJson, returnedByteArrays);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is single-threaded and doesn't allow developers to plug in custom serializers, there's an opportunity to make it a little bit more lightweight by not instantiating returnedByteArrays up front nor wrapping argReplacer in a further layer that runs for every nested object. Instead, the existing argReplacer function could take care of instantiating and populating some global returnedByteArrays object on every call. Example:

const serializedJson = (result === null || result === undefined)
  ? null
  : JSON.stringify(result, argReplacer);
const capturedByteArrays = globalCapturedByteArrays;
globalCapturedByteArrays = null;
return new SerializedArgs(serializedJson, capturedByteArrays);

... with:

// Pseudocode
function argReplacer() {
    // ... existing logic ...
    if (it is a byte array) {
        globalCapturedByteArrays = globalCapturedByteArrays || [];
        globalCapturedByteArrays[id] = newValue;
    }
}

The reason for suggesting this is trying to find ways to make the new functionality as close to zero cost as possible for existing apps that aren't using it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple of other places where this pattern would avoid the need for an extra layer of JSON revivers too.

{
if (parameterTypes.Length == 0)
{
return Array.Empty<object>();
}

var utf8JsonBytes = Encoding.UTF8.GetBytes(arguments);
var utf8JsonBytes = Encoding.UTF8.GetBytes(serializedArgs.ArgsJson ?? string.Empty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some additional scenario where this might be null now but wouldn't have been before? Or could we avoid this by changing the type declaration of SerializedArgs to say that ArgsJson is non-null?

if (value is null)
{
_byteArraysToDeserialize = null;
ReadSemaphore.Release();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely a signal we are doing something wrong here.

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented May 24, 2021

@TanayParikh Thanks very much for moving this forwards and continuing to refine it! Mostly this looks like really good stuff, but one aspect of the implementation makes me think we could refine it further. I suppose there are two aspects:

  • Trying to capture the byte[] instances in a context-specific way using unusual techniques like relying on non-nested invocations, the Semaphore, etc. This is kind of complex and subtle, and would at the very least need to get more complex still to handle nested invocations (which actually are a thing, since you can put a [JsonConverter] on any type and then put inside it custom logic which could in turn do JS interop or other serialization).
  • Passing the byte[][] parameters through all the various layers has a very noisy impact all over the API surface, and means this will always be a thing we have to allow for when doing any future maintenance on JS interop.

I think there could be an alternative way to capture and pass the byte[] instances that avoids both of these problems, plus is simpler overall. That is:

  • For the .NET-to-JS direction:
    • The JSRuntime base class could set up a single ByteArrayJsonConverter that, whenever it sees a byte[], immediately gives it an incrementing ID and passes the byte[] to JS without tracking it anywhere in .NET memory, and uses the ID in the serialized JSON (e.g., { __dotNetByteArray: 123 }).
    • For example, JSRuntime could have a new method like protected virtual void SupplyByteArray(long id, byte[] data), whose default implementation just throws NotSupportedException
    • Each JSRuntime subclass can take care of getting the array over to JS and calling some new function inside Microsoft.JSInterop.ts like DotNet.jsCallDispatcher.supplyByteArray(id: number, data: UInt8Array). The implementation of that function tracks the id/data pair in some Map<number, UInt8Array>.
    • Of course, when Microsoft.JSInterop.ts is later deserializing JSON, it can react to { __dotNetByteArray: 123 } by removing the corresponding instance from the map and supplying it.
    • This naturally cleans up after itself as long as .NET never throws during JSON serialization. If it does, then the JS side might never remove some of these Map entries, but I don't think that's a big problem because (1) people shouldn't have exceptions during JSON serialization, and if they do, then normally the user's session will be terminated anyway, and (2) JS memory isn't as security-critical like .NET memory is for Blazor Server, and (3) if this ends up being a concern in the long run, we could set up some timer-based thing on the JS side to clean up any entries that haven't been consumed after some timeout.
  • For the JS-to-.NET direction, the exact same technique could be used, with a couple of minor variations.
    • Microsoft.JSInterop.ts would respond to seeing a UInt8Array by immediately calling some new function DotNetCallDispatcher.supplyByteArray(index: number, data: UInt8Array). Each runtime platform would plug in some way of transporting the data and calling DotNetDispatcher.SupplyByteArray on the .NET side. Note that in this direction, the IDs can be call-specific indices (i.e., restarting at 0 for each call). This is safe because JS-side serialization is always singlethreaded, synchronous, and not-user-extensible, and is always synchronously followed by the actual invocation of .NET code, so we know there's always a continuous series of synchronous supplyByteArray calls that synchronously precede the beginInvokeDotNet or whatever it's called.
    • DotNetDispatcher.SupplyByteArray would track the incoming data in some JSRuntime-instance-specific List<byte[]>. For security, it would need to impose a rule that you can't supply too much data in total before the actual JS interop call that consumes the data. I'd recommend we have a rule that says the total byte length of all the SupplyByteArray calls cannot exceed the configured SignalR "max message size", just so there isn't an extra thing to configure.
    • Of course, the JSRuntime would also have a converter that sees incoming { __jsByteArray: 3 }-type values and looks up the corresponding entry in the List<byte[]>. Finally, before dispatching the call into user code, it would empty out the List<byte[]> because we know it's now finished. It could also empty out that list each time it sees an incoming index == 0, in case any errors caused previous processes to be left incomplete.
    • The .NET side automatically keeps itself tidy and resilient to any leaks due to failure, because it never holds more bytes than "max signalR message size" and always empties the storage at the start of each new call. It doesn't have to worry about overlapping calls because that just isn't possible in this direction (the serialization and sending of messages is synchronous on the JS side). If it receives too much data, or out-of-order data, it can immediately just throw and let the circuit be torn down.

Sorry this was such a huge dump of ideas - just trying to be specific to avoid wasting any of your time. If any of it seems wrong, needs clarification, or you can think of any improvements, let me know! The main benefits of this kind of approach are that we avoid all the difficulties about threading/overlaps, plus we don't need to make any changes to the majority of the layers in the API (e.g., all the BeginInvoke/EndInvoke/etc method signatures wouldn't have to be changed at all). Does this make sense?

@SteveSandersonMS
Copy link
Member

In case it helps, the approach I'm suggesting here is basically the same thing as I was doing in the streaming-from-dotnet-to-JS prototype. Having these use almost identical techniques also helps with keeping this area more comprehensible in the long run.

@javiercn
Copy link
Member

  • This naturally cleans up after itself as long as .NET never throws during JSON serialization. If it does, then the JS side might never remove some of these Map entries, but I don't think that's a big problem because (1) people shouldn't have exceptions during JSON serialization, and if they do, then normally the user's session will be terminated anyway, and (2) JS memory isn't as security-critical like .NET memory is for Blazor Server, and (3) if this ends up being a concern in the long run, we could set up some timer-based thing on the JS side to clean up any entries that haven't been consumed after some timeout.

We can just put this on a try..catch and send a signal to clear state when an exception happens.

@javiercn
Copy link
Member

  • DotNetDispatcher.SupplyByteArray would track the incoming data in some JSRuntime-instance-specific List<byte[]>. For security, it would need to impose a rule that you can't supply too much data in total before the actual JS interop call that consumes the data. I'd recommend we have a rule that says the total byte length of all the SupplyByteArray calls cannot exceed the configured SignalR "max message size", just so there isn't an extra thing to configure.

Agree

@javiercn
Copy link
Member

  • Of course, the JSRuntime would also have a converter that sees incoming { __jsByteArray: 3 }-type values and looks up the corresponding entry in the List<byte[]>. Finally, before dispatching the call into user code, it would empty out the List<byte[]> because we know it's now finished. It could also empty out that list each time it sees an incoming index == 0, in case any errors caused previous processes to be left incomplete.

We might want to do this using ArrayPool<byte []>.Shared or setting the list to null once we dispatch the call to avoid holding on to potential large lists/arrays.

@SteveSandersonMS
Copy link
Member

We might want to do this using ArrayPool<byte []>.Shared or setting the list to null once we dispatch the call to avoid holding on to potential large lists/arrays.

Agreed. By storing it in a ArrayBuilder<byte[]> (instead of List<byte[]>) we'd get the pooling and cleanup automatically.

@TanayParikh
Copy link
Contributor Author

Thanks for the reviews, they were really helpful. I was going through making the changes per @SteveSandersonMS's suggestions and I just wanted to confirm two things.

1. Security

Steve mentions the security related concerns of being provided "too much data", and recommends "we have a rule that says the total byte length of all the SupplyByteArray calls cannot exceed the configured SignalR "max message size", just so there isn't an extra thing to configure."

This protects against large amounts of data, but large amounts of (empty) byte arrays could still potentially pose a DOS vector?

We can represent an empty byte array via:

{ "__byte": 65536 },

Assuming un-compressed transfer, this would be about 8 bytes (probably a bit larger, not too familiar with messagepack internals). So a single 32kB request (as that's our imposed rule above) could have 4096 empty byte arrays (and consequently 4097 individual messages).

Granted an attacker doing this, vs. sending 4097 random individual requests wouldn't be that different (and that's already possible).

Additionally could we run into rate-limiting concerns on the server side for something like this?

2. Whether or not out-of-order delivery of byte arrays could pose an issue (or if it's even possible).

The issue is broken down into two parts, out of order delivery of the byte arrays and the byte arrays not having arrived by the time we're deserializing.

  • For .NET-to-JS:
    • Out of order delivery
      • Not an issue
      • Storing byte arrays in JS under a unique identifier that should be O(1) accessible via map when it comes to reviving the serialized JSON.
    • Not all byte arrays arrived by the time we're deserializing.
      • Not an issue.
      • JS side is handling requests sequentially.
  • For JS-to-.NET:
    • Out of order delivery
      • Not an issue
      • During JS serialization when we're extracting and dispatching this would all be singly threaded (and we'd wait on the completion of each?)
      • We care about in-order delivery here if we're storing the data in a ArrayBuilder<byte[]> as we're not preserving IDs (ie. element 2 & 3 could theoretically swap places with out of order delivery which would be a critical error).
        • We could utilize a Dictionary<long, byte[]> however we'd lose the ArrayPool/memory optimizations provided by the ArrayBuilder<byte[]>
    • Not all byte arrays arrived by the time we're deserializing.
      • Not an issue
      • Shouldn't be possible as JS would've sent the requests sequentially (and waited for completion before continuing?)

Are these assumptions correct?

@javiercn
Copy link
Member

Security

I don't think we need to worry about someone sending "empty" byte arrays. We can cap the maximum number of byte arrays by always adding at least X to the size when we receive a byte array, so at most someone could send 32768 empty arrays (for X = 1). We can probably reduce this number as follows:

  • 32768 / 4 = 8192 since [] encoded will be represented as 4 chars when base64 encoded.
  • Probably we can reduce this further to 2048 or 4096. One way to figure out is to determine how many arrays can be sent with the default config and set a maximum number of arrays based on that. So MaxMessageSize / 4 ~= Max acceptable arrays.

Second, we can optimize against empty arrays by using Array.Empty<byte>() which will reuse a shared instance.

Out-of-Order delivery

Not an issue since we use SignalR to guarantee the order of the operations.

@SteveSandersonMS
Copy link
Member

This protects against large amounts of data, but large amounts of (empty) byte arrays could still potentially pose a DOS vector?

These are great things to consider! Thanks for working through these details.

Second, we can optimize against empty arrays by using Array.Empty() which will reuse a shared instance.

Good point.

Also it might be that even nonempty arrays can be tracked without instantiating anything new on a per-array basis. I think MessagePack's API already gives us a ReadOnlySequence<byte> which might just be a reference to a range of bytes in the original incoming message, so tracking it doesn't cost any new allocations anyway. We would only have to convert this into an actual byte[] (which would definitely allocate, at least for non-empty ones) at the point of JSON deserialization when we're populating a byte[] on the .NET model object.

@javiercn
Copy link
Member

Also it might be that even nonempty arrays can be tracked without instantiating anything new on a per-array basis. I think MessagePack's API already gives us a ReadOnlySequence<byte> which might just be a reference to a range of bytes in the original incoming message, so tracking it doesn't cost any new allocations anyway. We would only have to convert this into an actual byte[] (which would definitely allocate, at least for non-empty ones) at the point of JSON deserialization when we're populating a byte[] on the .NET model object.

We would need to understand the lifetime/ownership of the ReadOnlySequence we are given, but in case it is "ours" then yes, we could avoid other allocations. Just want to make sure we don't think the buffer is ours when it might not be, since in that case it would outlive the message lifetime.

@TanayParikh TanayParikh deleted the taparik/blazorserverByteArrayInterop branch May 25, 2021 22:40
@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-blazor Includes: Blazor, Razor Components feature-blazor-server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants