Blazorserver Byte Array Interop Support #32259

TanayParikh · 2021-04-29T02:31:56Z

This is a first draft of the blazor server byte array interop support, would appreciate any feedback.

TODO; ByteArrayJsonConverter tests

Part of: #21877

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts

TanayParikh · 2021-04-29T16:30:23Z

Note; the CI will continue to fail due to WASM changes required. I'm planning on handling WASM with another PR targeting this branch. In the meantime this branch/PR will serve as the blazorserver implementation.

SteveSandersonMS · 2021-04-30T09:20:43Z

src/Components/Server/src/Circuits/CircuitHost.cs

@@ -362,7 +362,7 @@ public async Task BeginInvokeDotNetFromJS(string callId, string assemblyName, st

        // EndInvokeJSFromDotNet is used in a fire-and-forget context, so it's responsible for its own
        // error handling.
-        public async Task EndInvokeJSFromDotNet(long asyncCall, bool succeeded, string arguments)
+        public async Task EndInvokeJSFromDotNet(long callId, bool succeeded, string resultOrError, byte[][]? byteArrays)


Ah yes, these are better names. Thanks for cleaning this up!

src/JSInterop/Microsoft.JSInterop/src/Infrastructure/ByteArrayJsonConverter.cs

SteveSandersonMS · 2021-04-30T09:25:57Z

src/JSInterop/Microsoft.JSInterop/src/Infrastructure/DotNetDispatcher.cs

-        public static string? Invoke(JSRuntime jsRuntime, in DotNetInvocationInfo invocationInfo, string argsJson)
+        /// <param name="byteArrays">Byte array data extracted from the arguments for direct transfer.</param>
+        /// <returns>A tuple containing the JSON representation of the return value, or null, along with the extracted byte arrays, or null.</returns>
+        public static (string?, byte[][]?) Invoke(JSRuntime jsRuntime, in DotNetInvocationInfo invocationInfo, string argsJson, byte[][]? byteArrays)


Might it be preferable to define a readonly struct for this return type rather than the tuple? Just in case, in the future, we want to add even more things.

Agree, we don't use tuples in public APIs.

@SteveSandersonMS what was the reason for this to be public?

what was the reason for this to be public?

Layering reasons. JS interop is an independent layer from any of the hosting platforms, and any hosting platform that needs to deliver an inbound synchronous JS interop call would do so by calling this. Currently that's only WebAssembly.

Thanks, updated.
3056f46#diff-d507946c1cb83ccdd4ef4e3dc431556cf387d61cb9c6e9fc6cc7c744f3868c10R10

SteveSandersonMS · 2021-04-30T09:31:05Z

src/JSInterop/Microsoft.JSInterop/src/JSRuntime.cs

                    var result = JsonSerializer.Deserialize(ref jsonReader, resultType, JsonSerializerOptions);
+                    ByteArraysToDeserialize = null;


This pattern with ByteArraysToDeserialize might be the first place where JSRuntime stops being thread-safe. That might be absolutely fine with our current use cases, but it would be good to:

Check that our current use cases don't do concurrent invocations

Add a check here that would helps us detect quickly in the future if we change things and break the assumption. For example, just before line 229, you could check if ByteArraysToDeserialize was non-null and if so, throw. I guess there could be some similar flag to verify access around the logic in SerializeArgs, if not an actual lock.

Actually it's more than just thread-safety - it's also about nested invocations. I'm not sure whether it's possible in our current use cases, but it's imaginable that someone might have a JSON converter that itself performs JS interop.

If nested invocations is a scenario (not saying it is), then something like StackObjectPool<T> from the Components project might help with minimising the number of lists that get allocated, while retaining supported for nested usage.

I had the same thoughts. I'm not concerned about thread safety since all these operations must happen inside the sync context. Nested invocations could be a thing, however I would suggest we detect that case and throw an exception as "unsupported" since it is possible but very unlikely.

I'm not concerned about thread safety since all these operations must happen inside the sync context

That may be true in our current uses but isn't part of the contract as far as the JS interop layer is concerned. JS interop is technically a separate thing and doesn't have a concept of a sync context.

We can define JSRuntime as being not thread-safe though, and yes it would be great to detect invalid usage and throw as this would simplify detecting any future issues.

I believe 9995e90 & 8aeef7d should resolve these concerns.

Regarding thread-safety/nested invocation: These added checks should be able to detect when we're trying to interrupt a serialization already in progress (and guard against it via semaphores which must be acquired prior to a (de)serialization request can be completed).

Please let me know if there are still additional concerns.

SteveSandersonMS · 2021-04-30T09:32:36Z

This looks really promising. Thanks for tidying up the existing code too!

javiercn · 2021-05-04T14:01:43Z

src/JSInterop/Microsoft.JSInterop/src/JSRuntime.cs

+        /// <summary>
+        /// Contains the combined byte array(s) being serialized.
+        /// </summary>
+        internal readonly List<byte[]> ByteArraysToSerialize = new();


We need to rethink this a bit.

The problem here is that if I serialize 10K byte arrays this list will grow up to 10K buckets and won't shrink when you call Clear afterwards.

It might be better to implement a custom type for this, and use an array pool internally to track the byte arrays.

The ArrayBuilder<T> class in the Shared sources might be a convenient fit. It grows by renting increasingly large buffers from an ArrayPool and when you call Clear it returns the current buffer to the pool. So the pooling concerns would be handled implicitly.

Thanks, cdba736 should handle this 😄

javiercn · 2021-05-04T14:04:08Z

src/JSInterop/Microsoft.JSInterop/src/JSRuntime.cs

+        protected internal (string, byte[][]?) SerializeArgs(object? args)
+        {
+            ByteArraysToSerialize.Clear();
+            var serializedArgs = JsonSerializer.Serialize(args, JsonSerializerOptions);
+            var byteArrays = ByteArraysToSerialize.ToArray();
+
+            return (serializedArgs, byteArrays);
+        }


It might be more work and lead to a bit of duplication, however, it would be great if we could push this method down to the implementation classes. The reason is that this avoid adding extra public API which might make harder to change things in the future.

I would flip this around and, if possible, push more of the serialization into the underlying JS interop layer. The intent from the beginning was to make the serialization a non-negotiable part of the underlying layer that particular platform implementations can't effect. That's why we pass argsJson (pre-serialized) to methods like BeginInvokeJS. It means that when we, or someone else, implements support for JS interop on a new platform, they only have to think about the physical transport of the basic string data type (or, in the future, string plus a collection of byte arrays) and don't reinvent any of the serialization semantics and risk any differences across platforms. For example, platforms can't change the set of JSON formatters, as it's intended to evolve over time.

There is a gap in the way this is defined today, though. Unintentional as far as I know. Our EndInvokeDotNet method receives DotNetInvocationResult which in turn has a Result which is not pre-serialized. If we were to follow the pattern consistently, DotNetInvocationResult.Result should be something like DotNetInvocationResult.ResultJson so that each platform implementation isn't involved in the serialization.

Since we're in the middle of making breaking changes to the API contract here, we could take the opportunity to fix this by changing DotNetInvocationResult to have both ResultJson and ResultByteArrays which are both opaque to the per-platform implementations (they just have to transport it).

I've refactored this to be more centered around the JSONConverter. Please let me know whether this is still a concern.

javiercn

Overall looks great!

Here are some additional thoughts:

We need to figure out what we do with "nested" calls.
- My suggestion would be that we prevent against those and we throw for the time being.
- It is impossible for customers today to customize JSON serialization in a way that they can perform JS interop calls and that it is
  thread safe, since they can't add a converter to the JSON options.
We can't just use a list to hold on to the byte arrays in the JSRuntime instance since that can grow and won't shrink.
- We need some type of "scope" that begins right before the call to serialize arguments and "finishes" right after the arguments have been serialized.
- I think we'll be better served if we avoid using byte[][] in the signatures and we replace it with an opaque struct we know how to work with but that doesn't offer a public API.
Ideally, it would be nice if we can push this down to our implementations and have it not be part of the interop contract.
- Have a "separate" manager for handling the lifetime of the byte arrays
- Create the manager within the circuit host.
- Create the converter and have it receive "the manager".
- Add the JsonConverter to the JsonSerializationOptions that the JS runtime uses (it's available for derived classes).
- On each call site (within circuithost) "create a new scope" when you are about to process a JS interop call.
- Dispose of the scope after serialization/deserialization has completed.

SteveSandersonMS · 2021-05-05T14:52:16Z

src/JSInterop/Microsoft.JSInterop/test/JSInProcessRuntimeTest.cs

@@ -111,7 +111,7 @@ public class InvokeArgs
                public string? ArgsJson { get; set; }
            }

-            protected override void BeginInvokeJS(long asyncHandle, string identifier, string? argsJson, JSCallResultType resultType, long targetInstanceId)
+            protected override void BeginInvokeJS(long asyncHandle, string identifier, string? argsJson, byte[][]? byteArrays, JSCallResultType resultType, long targetInstanceId)


Everywhere that we have a pair like argsJson and byteArrays, it would be great to rename the latter to something like argsByteArrays to clarify that it's part of the same logical concept (the args).

Though since we're doing breaking changes, we might even want to define something like:

public readonly struct InvocationArgs { public readonly string JsonData { get; } public readonly byte[][] BinaryData { get; } }

BrennanConroy · 2021-05-14T15:59:36Z

src/Components/Server/src/BlazorPack/BlazorPackHubProtocolWorker.cs

+                else if (type == typeof(byte[][]))
+                {
+                    var length = reader.ReadArrayHeader();
+                    var result = new byte[length][];


FYI this is bad, length comes from user provided input, so you cannot pre-allocate the array.

Thanks, updated to utilize a list for the result.

TanayParikh · 2021-05-24T02:33:30Z

@javiercn

We need to figure out what we do with "nested" calls.

My suggestion would be that we prevent against those and we throw for the time being.

It is impossible for customers today to customize JSON serialization in a way that they can perform JS interop calls and that it is
thread safe, since they can't add a converter to the JSON options.

#32259 (comment)

We can't just use a list to hold on to the byte arrays in the JSRuntime instance since that can grow and won't shrink.

We need some type of "scope" that begins right before the call to serialize arguments and "finishes" right after the arguments have been serialized.

I think we'll be better served if we avoid using byte[][] in the signatures and we replace it with an opaque struct we know how to work with but that doesn't offer a public API.

Blazorserver Byte Array Interop Support #32259 (comment)
b0b5a6b
- However from your comment I believe your concern is the arguments during the actual interop which this commit doesn't cover. If you could please confirm and I can update accordingly 😄

Ideally, it would be nice if we can push this down to our implementations and have it not be part of the interop contract.

Have a "separate" manager for handling the lifetime of the byte arrays

Create the manager within the circuit host.

Create the converter and have it receive "the manager".

Add the JsonConverter to the JsonSerializationOptions that the JS runtime uses (it's available for derived classes).

On each call site (within circuithost) "create a new scope" when you are about to process a JS interop call.

Dispose of the scope after serialization/deserialization has completed.

I've refactored this up quite a bit and would appreciate a second look. I'm a bit hesitant to take this approach due to the (significant) added complexity we may be introducing. The changes I've made to add guards for nested invocations, semaphores to avoid thread safety issues and utilizing the ArrayBuilder to be conscious of large transfers should all hopefully mitigate these concerns. Please let me know if you want to discuss this offline.

SteveSandersonMS · 2021-05-24T10:15:17Z

src/Components/Server/src/BlazorPack/BlazorPackHubProtocolWorker.cs

+                    {
+                        result.Add(reader.ReadBytes().GetValueOrDefault().ToArray());
+                    }
+                    return result.ToArray();


I notice this logic is doing all the allocations for the byte arrays up front. That might be the only possible way to do it, but I'm asking to be sure.

Reason

There's no maximum number of entries in this byte[][], other than the constraint imposed by the SignalR max message size. So as it stands, if the SignalR max message size was (say) 32KB, and if MessagePack could represent an empty array in (say) 4 bytes, then this logic could have the server perform 8000-ish heap allocations on each incoming message, whether or not there even is any JS invokable endpoint that accepts binary data.

I suppose that's not much different from a JSON payload that contains 8000 very short distinct strings, so maybe it's nothing to worry about.

Possible alternative

If this logic returned a List<ReadOnlySequence<byte>> (or similar) instead, would that avoid the per-entry allocations? I guess it depends on how MessagePack's ReadBytes is implemented. But it we were able to do this and pass it through to the JSON deserializer, then the JSON deserializer would only need to convert the ReadOnlySequence<byte> into a byte[] in the case where it matches an actual byte[] property declared on a .NET type. So a mischief-maker wouldn't be able to force any more allocations than the target data structure declares (except if the target data structure contains something like a List).

I know it really might not make any difference, given that people will sometimes declare List-type structures on their target types. But if it's no more difficult to delay the allocations instead of doing them up front, it's worth considering.

Update: the suggestion below on a different transport mechanism would make this concern obsolete.

SteveSandersonMS · 2021-05-24T10:21:22Z

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts

@@ -3,7 +3,7 @@
 export module DotNet {
  (window as any).DotNet = DotNet; // Ensure reachable from anywhere

-  export type JsonReviver = ((key: any, value: any) => any);
+  export type JsonReviver = ((key: any, value: any, byteArrays: Uint8Array[] | null) => any);


If possible it would be preferable not to expose byteArrays to externally-registered revivers, since it's not intended to be part of the public API surface.

To do this it will probably be necessary to tweak the parseJsonWithRevivers logic so that it calls our own built-in reviver first in a hardcoded way with 3 params, then uses the reduce logic with 2 params for all externally-registered ones.

SteveSandersonMS · 2021-05-24T10:22:46Z

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts

    return json ? JSON.parse(json, (key, initialValue) => {
      // Invoke each reviver in order, passing the output from the previous reviver,
      // so that each one gets a chance to transform the value
      return jsonRevivers.reduce(
-        (latestValue, reviver) => reviver(key, latestValue),
+        (latestValue, reviver) => reviver(key, latestValue, byteArrays === undefined ? null : byteArrays),


I don't think it's necessary to convert undefined to null here since our logic can treat undefined the same as null.

Normally I wouldn't bother about such fine-tuning but JSON revivers are called for every single object nested in the whole graph on every JS interop call, so they run the risk of being a bottleneck.

SteveSandersonMS · 2021-05-24T10:27:01Z

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts

+    ArgsJson: string | null;
+    ByteArrays: Uint8Array[] | null;
+
+    constructor(argsJson: string | null, byteArrays: Uint8Array[] | null) {
+      this.ArgsJson = argsJson;
+      this.ByteArrays = byteArrays;
+    }


Suggested change

ArgsJson: string | null;

ByteArrays: Uint8Array[] | null;

constructor(argsJson: string | null, byteArrays: Uint8Array[] | null) {

this.ArgsJson = argsJson;

this.ByteArrays = byteArrays;

}

argsJson: string | null;

byteArrays: Uint8Array[] | null;

constructor(argsJson: string | null, byteArrays: Uint8Array[] | null) {

this.argsJson = argsJson;

this.byteArrays = byteArrays;

}

SteveSandersonMS · 2021-05-24T10:31:59Z

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts

@@ -244,9 +260,10 @@ export module DotNet {
     * @param methodIdentifier The identifier of the method to invoke. The method must have a [JSInvokable] attribute specifying this identifier.
     * @param dotNetObjectId If given, the call will be to an instance method on the specified DotNetObject. Pass null or undefined to call static methods.
     * @param argsJson JSON representation of arguments to pass to the method.
-     * @returns JSON representation of the result of the invocation.
+     * @param byteArrays Byte array data extracted from the arguments for direct transfer.
+     * @returns SerializedArgs containing the string JSON args along with the extracted byte arrays representation of the result of the invocation.


Would it be OK to tweak the naming here? Since this is a return value, calling it "args" is quite confusing. It would be great to declare another type like SerializedResult, even if structurally it's the same as SerializedArgs.

Same with invokeJSFromDotNet below.

SteveSandersonMS · 2021-05-24T10:40:25Z

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts

-      return result === null || result === undefined
+      const returnedByteArrays : Uint8Array[] = [];
+      var serializedJson = (result === null || result === undefined)
        ? null
-        : JSON.stringify(result, argReplacer);
+        : JSON.stringify(result, (key, value) => argReplacer(key, value, returnedByteArrays)); // TODO; confirm this works for blazor wasm
+      return new SerializedArgs(serializedJson, returnedByteArrays);


Since this is single-threaded and doesn't allow developers to plug in custom serializers, there's an opportunity to make it a little bit more lightweight by not instantiating returnedByteArrays up front nor wrapping argReplacer in a further layer that runs for every nested object. Instead, the existing argReplacer function could take care of instantiating and populating some global returnedByteArrays object on every call. Example:

const serializedJson = (result === null || result === undefined) ? null : JSON.stringify(result, argReplacer); const capturedByteArrays = globalCapturedByteArrays; globalCapturedByteArrays = null; return new SerializedArgs(serializedJson, capturedByteArrays);

... with:

// Pseudocode function argReplacer() { // ... existing logic ... if (it is a byte array) { globalCapturedByteArrays = globalCapturedByteArrays || []; globalCapturedByteArrays[id] = newValue; } }

The reason for suggesting this is trying to find ways to make the new functionality as close to zero cost as possible for existing apps that aren't using it.

There's a couple of other places where this pattern would avoid the need for an extra layer of JSON revivers too.

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts

SteveSandersonMS · 2021-05-24T10:54:22Z

src/JSInterop/Microsoft.JSInterop/src/Infrastructure/DotNetDispatcher.cs

        {
            if (parameterTypes.Length == 0)
            {
                return Array.Empty<object>();
            }

-            var utf8JsonBytes = Encoding.UTF8.GetBytes(arguments);
+            var utf8JsonBytes = Encoding.UTF8.GetBytes(serializedArgs.ArgsJson ?? string.Empty);


Is there some additional scenario where this might be null now but wouldn't have been before? Or could we avoid this by changing the type declaration of SerializedArgs to say that ArgsJson is non-null?

javiercn · 2021-05-24T11:38:09Z

src/JSInterop/Microsoft.JSInterop/src/Infrastructure/ByteArrayJsonConverter.cs

+                if (value is null)
+                {
+                    _byteArraysToDeserialize = null;
+                    ReadSemaphore.Release();


This is likely a signal we are doing something wrong here.

SteveSandersonMS · 2021-05-24T13:09:10Z

@TanayParikh Thanks very much for moving this forwards and continuing to refine it! Mostly this looks like really good stuff, but one aspect of the implementation makes me think we could refine it further. I suppose there are two aspects:

Trying to capture the byte[] instances in a context-specific way using unusual techniques like relying on non-nested invocations, the Semaphore, etc. This is kind of complex and subtle, and would at the very least need to get more complex still to handle nested invocations (which actually are a thing, since you can put a [JsonConverter] on any type and then put inside it custom logic which could in turn do JS interop or other serialization).
Passing the byte[][] parameters through all the various layers has a very noisy impact all over the API surface, and means this will always be a thing we have to allow for when doing any future maintenance on JS interop.

I think there could be an alternative way to capture and pass the byte[] instances that avoids both of these problems, plus is simpler overall. That is:

For the .NET-to-JS direction:
- The JSRuntime base class could set up a single ByteArrayJsonConverter that, whenever it sees a byte[], immediately gives it an incrementing ID and passes the byte[] to JS without tracking it anywhere in .NET memory, and uses the ID in the serialized JSON (e.g., { __dotNetByteArray: 123 }).
- For example, JSRuntime could have a new method like protected virtual void SupplyByteArray(long id, byte[] data), whose default implementation just throws NotSupportedException
- Each JSRuntime subclass can take care of getting the array over to JS and calling some new function inside Microsoft.JSInterop.ts like DotNet.jsCallDispatcher.supplyByteArray(id: number, data: UInt8Array). The implementation of that function tracks the id/data pair in some Map<number, UInt8Array>.
- Of course, when Microsoft.JSInterop.ts is later deserializing JSON, it can react to { __dotNetByteArray: 123 } by removing the corresponding instance from the map and supplying it.
- This naturally cleans up after itself as long as .NET never throws during JSON serialization. If it does, then the JS side might never remove some of these Map entries, but I don't think that's a big problem because (1) people shouldn't have exceptions during JSON serialization, and if they do, then normally the user's session will be terminated anyway, and (2) JS memory isn't as security-critical like .NET memory is for Blazor Server, and (3) if this ends up being a concern in the long run, we could set up some timer-based thing on the JS side to clean up any entries that haven't been consumed after some timeout.
For the JS-to-.NET direction, the exact same technique could be used, with a couple of minor variations.
- Microsoft.JSInterop.ts would respond to seeing a UInt8Array by immediately calling some new function DotNetCallDispatcher.supplyByteArray(index: number, data: UInt8Array). Each runtime platform would plug in some way of transporting the data and calling DotNetDispatcher.SupplyByteArray on the .NET side. Note that in this direction, the IDs can be call-specific indices (i.e., restarting at 0 for each call). This is safe because JS-side serialization is always singlethreaded, synchronous, and not-user-extensible, and is always synchronously followed by the actual invocation of .NET code, so we know there's always a continuous series of synchronous supplyByteArray calls that synchronously precede the beginInvokeDotNet or whatever it's called.
- DotNetDispatcher.SupplyByteArray would track the incoming data in some JSRuntime-instance-specific List<byte[]>. For security, it would need to impose a rule that you can't supply too much data in total before the actual JS interop call that consumes the data. I'd recommend we have a rule that says the total byte length of all the SupplyByteArray calls cannot exceed the configured SignalR "max message size", just so there isn't an extra thing to configure.
- Of course, the JSRuntime would also have a converter that sees incoming { __jsByteArray: 3 }-type values and looks up the corresponding entry in the List<byte[]>. Finally, before dispatching the call into user code, it would empty out the List<byte[]> because we know it's now finished. It could also empty out that list each time it sees an incoming index == 0, in case any errors caused previous processes to be left incomplete.
- The .NET side automatically keeps itself tidy and resilient to any leaks due to failure, because it never holds more bytes than "max signalR message size" and always empties the storage at the start of each new call. It doesn't have to worry about overlapping calls because that just isn't possible in this direction (the serialization and sending of messages is synchronous on the JS side). If it receives too much data, or out-of-order data, it can immediately just throw and let the circuit be torn down.

Sorry this was such a huge dump of ideas - just trying to be specific to avoid wasting any of your time. If any of it seems wrong, needs clarification, or you can think of any improvements, let me know! The main benefits of this kind of approach are that we avoid all the difficulties about threading/overlaps, plus we don't need to make any changes to the majority of the layers in the API (e.g., all the BeginInvoke/EndInvoke/etc method signatures wouldn't have to be changed at all). Does this make sense?

SteveSandersonMS · 2021-05-24T13:13:55Z

In case it helps, the approach I'm suggesting here is basically the same thing as I was doing in the streaming-from-dotnet-to-JS prototype. Having these use almost identical techniques also helps with keeping this area more comprehensible in the long run.

javiercn · 2021-05-24T15:12:24Z

This naturally cleans up after itself as long as .NET never throws during JSON serialization. If it does, then the JS side might never remove some of these Map entries, but I don't think that's a big problem because (1) people shouldn't have exceptions during JSON serialization, and if they do, then normally the user's session will be terminated anyway, and (2) JS memory isn't as security-critical like .NET memory is for Blazor Server, and (3) if this ends up being a concern in the long run, we could set up some timer-based thing on the JS side to clean up any entries that haven't been consumed after some timeout.

We can just put this on a try..catch and send a signal to clear state when an exception happens.

javiercn · 2021-05-24T15:13:36Z

DotNetDispatcher.SupplyByteArray would track the incoming data in some JSRuntime-instance-specific List<byte[]>. For security, it would need to impose a rule that you can't supply too much data in total before the actual JS interop call that consumes the data. I'd recommend we have a rule that says the total byte length of all the SupplyByteArray calls cannot exceed the configured SignalR "max message size", just so there isn't an extra thing to configure.

Agree

javiercn · 2021-05-24T15:16:11Z

Of course, the JSRuntime would also have a converter that sees incoming { __jsByteArray: 3 }-type values and looks up the corresponding entry in the List<byte[]>. Finally, before dispatching the call into user code, it would empty out the List<byte[]> because we know it's now finished. It could also empty out that list each time it sees an incoming index == 0, in case any errors caused previous processes to be left incomplete.

We might want to do this using ArrayPool<byte []>.Shared or setting the list to null once we dispatch the call to avoid holding on to potential large lists/arrays.

SteveSandersonMS · 2021-05-24T15:36:47Z

We might want to do this using ArrayPool<byte []>.Shared or setting the list to null once we dispatch the call to avoid holding on to potential large lists/arrays.

Agreed. By storing it in a ArrayBuilder<byte[]> (instead of List<byte[]>) we'd get the pooling and cleanup automatically.

TanayParikh · 2021-05-25T00:47:41Z

Thanks for the reviews, they were really helpful. I was going through making the changes per @SteveSandersonMS's suggestions and I just wanted to confirm two things.

1. Security

Steve mentions the security related concerns of being provided "too much data", and recommends "we have a rule that says the total byte length of all the SupplyByteArray calls cannot exceed the configured SignalR "max message size", just so there isn't an extra thing to configure."

This protects against large amounts of data, but large amounts of (empty) byte arrays could still potentially pose a DOS vector?

We can represent an empty byte array via:

{ "__byte": 65536 },

Assuming un-compressed transfer, this would be about 8 bytes (probably a bit larger, not too familiar with messagepack internals). So a single 32kB request (as that's our imposed rule above) could have 4096 empty byte arrays (and consequently 4097 individual messages).

Granted an attacker doing this, vs. sending 4097 random individual requests wouldn't be that different (and that's already possible).

Additionally could we run into rate-limiting concerns on the server side for something like this?

2. Whether or not out-of-order delivery of byte arrays could pose an issue (or if it's even possible).

The issue is broken down into two parts, out of order delivery of the byte arrays and the byte arrays not having arrived by the time we're deserializing.

For .NET-to-JS:
- Out of order delivery
  - Not an issue
  - Storing byte arrays in JS under a unique identifier that should be O(1) accessible via map when it comes to reviving the serialized JSON.
- Not all byte arrays arrived by the time we're deserializing.
  - Not an issue.
  - JS side is handling requests sequentially.
For JS-to-.NET:
- Out of order delivery
  - Not an issue
  - During JS serialization when we're extracting and dispatching this would all be singly threaded (and we'd wait on the completion of each?)
  - We care about in-order delivery here if we're storing the data in a ArrayBuilder<byte[]> as we're not preserving IDs (ie. element 2 & 3 could theoretically swap places with out of order delivery which would be a critical error).
    - We could utilize a Dictionary<long, byte[]> however we'd lose the ArrayPool/memory optimizations provided by the ArrayBuilder<byte[]>
- Not all byte arrays arrived by the time we're deserializing.
  - Not an issue
  - Shouldn't be possible as JS would've sent the requests sequentially (and waited for completion before continuing?)

Are these assumptions correct?

javiercn · 2021-05-25T09:43:19Z

Security

I don't think we need to worry about someone sending "empty" byte arrays. We can cap the maximum number of byte arrays by always adding at least X to the size when we receive a byte array, so at most someone could send 32768 empty arrays (for X = 1). We can probably reduce this number as follows:

32768 / 4 = 8192 since [] encoded will be represented as 4 chars when base64 encoded.
Probably we can reduce this further to 2048 or 4096. One way to figure out is to determine how many arrays can be sent with the default config and set a maximum number of arrays based on that. So MaxMessageSize / 4 ~= Max acceptable arrays.

Second, we can optimize against empty arrays by using Array.Empty<byte>() which will reuse a shared instance.

Out-of-Order delivery

Not an issue since we use SignalR to guarantee the order of the operations.

SteveSandersonMS · 2021-05-25T10:06:23Z

This protects against large amounts of data, but large amounts of (empty) byte arrays could still potentially pose a DOS vector?

These are great things to consider! Thanks for working through these details.

Second, we can optimize against empty arrays by using Array.Empty() which will reuse a shared instance.

Good point.

Also it might be that even nonempty arrays can be tracked without instantiating anything new on a per-array basis. I think MessagePack's API already gives us a ReadOnlySequence<byte> which might just be a reference to a range of bytes in the original incoming message, so tracking it doesn't cost any new allocations anyway. We would only have to convert this into an actual byte[] (which would definitely allocate, at least for non-empty ones) at the point of JSON deserialization when we're populating a byte[] on the .NET model object.

javiercn · 2021-05-25T10:11:56Z

Also it might be that even nonempty arrays can be tracked without instantiating anything new on a per-array basis. I think MessagePack's API already gives us a ReadOnlySequence<byte> which might just be a reference to a range of bytes in the original incoming message, so tracking it doesn't cost any new allocations anyway. We would only have to convert this into an actual byte[] (which would definitely allocate, at least for non-empty ones) at the point of JSON deserialization when we're populating a byte[] on the .NET model object.

We would need to understand the lifetime/ownership of the ReadOnlySequence we are given, but in case it is "ours" then yes, we could avoid other allocations. Just want to make sure we don't think the buffer is ours when it might not be, since in that case it would outlive the message lifetime.

TanayParikh added 7 commits April 28, 2021 12:21

Update BlazorPackHubProtocolWorker to support byte[][]

3ec093b

Cleanup

f698466

Blazor Server Byte Array Interop

73fe206

Update PublicAPI.Shipped.txt

0b9c276

Merge branch 'main' into taparik/blazorserverByteArrayInterop

8ee28b3

Tests

2fd4b22

DotnetDispatcher.Invoke returns byte arrays

d2a2b73

TanayParikh commented Apr 29, 2021

View reviewed changes

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts Outdated Show resolved Hide resolved

TanayParikh requested review from a team, javiercn and SteveSandersonMS April 29, 2021 02:42

Merge branch 'main' into taparik/blazorserverByteArrayInterop

73de509

javiercn added area-blazor Includes: Blazor, Razor Components feature-blazor-server labels Apr 29, 2021

TanayParikh marked this pull request as ready for review April 29, 2021 16:28

TanayParikh mentioned this pull request Apr 29, 2021

Unquarantine CanInvokeDotNetMethods #32283

Merged

SteveSandersonMS reviewed Apr 30, 2021

View reviewed changes

src/JSInterop/Microsoft.JSInterop/src/Infrastructure/ByteArrayJsonConverter.cs Outdated Show resolved Hide resolved

SteveSandersonMS reviewed Apr 30, 2021

View reviewed changes

src/JSInterop/Microsoft.JSInterop/src/Infrastructure/ByteArrayJsonConverter.cs Outdated Show resolved Hide resolved

SteveSandersonMS reviewed Apr 30, 2021

View reviewed changes

javiercn reviewed May 4, 2021

View reviewed changes

SteveSandersonMS reviewed May 5, 2021

View reviewed changes

BrennanConroy reviewed May 14, 2021

View reviewed changes

TanayParikh added 2 commits May 21, 2021 11:45

Merge branch 'main' into taparik/blazorserverByteArrayInterop

ef647e7

Fix conflict

72ae8c1

TanayParikh added 4 commits May 23, 2021 17:58

Refactored and Guarded Serialization

9995e90

Added byte array json converter semaphores

8aeef7d

Utilize ArrayBuilder

cdba736

Update signatures to leverage SerializedArgs

b0b5a6b

SteveSandersonMS reviewed May 24, 2021

View reviewed changes

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts Show resolved Hide resolved

SteveSandersonMS reviewed May 24, 2021

View reviewed changes

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts Show resolved Hide resolved

SteveSandersonMS reviewed May 24, 2021

View reviewed changes

src/JSInterop/Microsoft.JSInterop.JS/src/src/Microsoft.JSInterop.ts Show resolved Hide resolved

SteveSandersonMS reviewed May 24, 2021

View reviewed changes

javiercn reviewed May 24, 2021

View reviewed changes

TanayParikh mentioned this pull request May 25, 2021

Blazor Byte Array Interop Support #33015

Merged

TanayParikh closed this May 25, 2021

TanayParikh deleted the taparik/blazorserverByteArrayInterop branch May 25, 2021 22:40

github-actions bot locked and limited conversation to collaborators Dec 8, 2023

		var result = JsonSerializer.Deserialize(ref jsonReader, resultType, JsonSerializerOptions);
		ByteArraysToDeserialize = null;

Blazorserver Byte Array Interop Support #32259

Blazorserver Byte Array Interop Support #32259

Conversation

TanayParikh commented Apr 29, 2021 • edited Loading

TanayParikh commented Apr 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveSandersonMS commented Apr 30, 2021

Choose a reason for hiding this comment

SteveSandersonMS May 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveSandersonMS May 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javiercn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TanayParikh commented May 24, 2021 • edited Loading

Choose a reason for hiding this comment

Reason

Possible alternative

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveSandersonMS commented May 24, 2021 • edited Loading

SteveSandersonMS commented May 24, 2021

javiercn commented May 24, 2021

javiercn commented May 24, 2021

javiercn commented May 24, 2021

SteveSandersonMS commented May 24, 2021

TanayParikh commented May 25, 2021

1. Security

2. Whether or not out-of-order delivery of byte arrays could pose an issue (or if it's even possible).

javiercn commented May 25, 2021

Security

Out-of-Order delivery

SteveSandersonMS commented May 25, 2021

javiercn commented May 25, 2021

TanayParikh commented Apr 29, 2021 •

edited

Loading

SteveSandersonMS May 5, 2021 •

edited

Loading

SteveSandersonMS May 5, 2021 •

edited

Loading

TanayParikh commented May 24, 2021 •

edited

Loading

SteveSandersonMS commented May 24, 2021 •

edited

Loading