Skip to content
This repository has been archived by the owner on May 17, 2019. It is now read-only.

Add RFC on data serialization/deserialization #8

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ganemone
Copy link
Contributor

@ganemone ganemone commented Aug 1, 2018

@old-fusion-bot old-fusion-bot bot added the docs label Aug 1, 2018
Copy link

@KevinGrandon KevinGrandon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Could we add perma-links to a few places where we do this today in open source repos?

SharedState: SharedStateToken,
},
middleware: ({SharedState}) => {
const {serialize, deserialize} = SharedState('some-id');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify a string id here? Wondering if we could generate one instead. Maybe even use a Token from createToken() that could specify the shape of the data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a string, especially to make the mocking experience good

Copy link
Contributor

@rtsao rtsao Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth adding a strawman example of what the testing experience would look like using tokens instead of string ids to the alternatives section. I agree mocking will probably be easier with strings than working with tokens, but maybe it's possible to have a token-based API that's still pretty good?

I'd feel sad adding string ids as part of this API especially considering how much effort/time was put into making a DI system that didn't rely on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the id be configured via a token, I just feel like it is kinda overkill. The plugin can export a UUID if you really want it to be guaranteed unique. FWIW, we are using string ids today.

Here is an example of what it would look like configuring the string as a token:

const MyPluginSharedStateIDToken = createToken('MyPluginSharedState');
createPlugin({
  deps: {
    SharedState: SharedStateToken,
    SharedStateID: MyPluginSharedStateIDToken
  },
  middleware({SharedState, SharedStateID}) {
     const {serialize, deserialize} = SharedState(SharedStateID);
  }
});

// ... in main.js
app.register(MyPluginSharedStateIDToken, 'some-id');
app.register(SharedStateToken, MockSharedState({
  'some-id': mockdata
});

This works, but honestly I think it is too much boilerplate as the number of apps which need serialization grows. I would prefer a pattern where the plugin exports the serialization id.

// In fusion-plugin-redux-react
export const SerializationID = '@fusion-plugin-redux-react/serialization-id'

// In test code
import {SerializationID} from 'fusion-plugin-redux-react';
app.register(SharedStateToken, MockSharedState({
  [SerializationID]: mockdata
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String IDs just feel off to me when we have tokens. Tokens feel like the right tool to me here to avoid naming collisions. Are we not concerned about possible naming collisions at all?

Here's one example of an API I was playing with, still on the fence, but it's using tokens, thoughts?

// ... in main.js
app.enhance(SharedStateToken, SharedState => SharedState(MyPluginSharedStateToken, shareddata));

// In test code
app.enhance(SharedStateToken, SharedState => SharedState(MyPluginSharedStateToken, mockdata));

// In plugin:
const {serialize, deserialize} = SharedState.extract(MyPluginSharedStateToken);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is pretty safe to use strings for this type of thing, especially since we are doing it today already with no problem. The string could be prefixed with the name of the npm package for example.

'some-id': {
some: 'json'
},
'other-data': {}
Copy link
Contributor

@rtsao rtsao Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work with custom serialization? For example the immutable case, would transit.toJSON be used directly, or should somehow the actual ImmutableSharedState serializer be used? What does this look like when multiple serializers are used (w/ aliasing like you have in the example below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually change the serialization, only the deserialization. So whether it is custom or default, this will return whatever the value is in the map. You could put an immutable-js object here directly

return transit.fromJSON(document.getElementById(id).textContent);
}
}
}
Copy link
Contributor

@rtsao rtsao Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, wouldn't it be better to just use the standard JSON serializer/deserializer (by using the the serialized immutable-transit strings as values) rather than a custom out-of-band serialization/deserialization?

I can't think of anything that would need for true custom serialization/deserialization aside from CSS-in-JS implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what you mean here. Could you elaborate, maybe with an example?

Copy link
Contributor

@rtsao rtsao Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the actual HTML part of the serialization and standardization can be entirely kept in core and not exposed. Custom deserializers/serializers need not involve HTML at all, just A → string and string → A functions, and the core SSR plugin itself can handle the HTML part.

So you'd have something sort of like this.

const DefaultSharedState = {
  serialize: data => JSON.stringify(data),
  deserialize: serializedData => JSON.parse(serializedData)
};

const ImmutableSharedState = {
  serialize: data => transit.toJSON(data),
  deserialize: serializedData => transit.fromJSON(serializedData)
};

I don't see the need for serialization/deserialization code to be actually concerned with modifying the HTML and/or querying the HTML. The only special case is CSS-in-JS, and in that case, they will want to implement that in a totally custom way rather than using this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true if we limit what you can do with the serialization / deserialization. For example, this would not be possible if we wanted that type of API:

const WindowSharedState = (id) => {
  return {
    serialize: (ctx, data) => {
      ctx.template.head.push(
        html`<script nonce=${ctx.nonce}'>window.SOME_DATA=${JSON.stringify(data)}</div>`
      );
    },
    deserialize: () => {
      return window.SOME_DATA
    }
  }
}

As long as we are okay with limiting the types of serialization/deserialization to <script type='application/json> then I am okay with this. I'm just not quite sure if this limitation will be a problem for us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's doubtful that constraint will be an issue in practice. The only non-contrived use case I've been able to come up thus far is CSS-in-JS, but it's unclear to me how mocking state would make sense in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - I can update the RFC to reflect that API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially what might be simpler is all values are always JSON.stringify/JSON.parse, and if you want something else you can just serialize/deserialize into a string, which is valid JSON. Although I suppose this might be more overhead.

This can make testing difficult, especially when running browser tests since there is not
an easy way to mock the deserialization other than each plugin exposing their own deserialization
token.

Copy link
Contributor

@rtsao rtsao Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea! I think generally it's good if we move things that do side effects into the DI system (for testing as you mention). Hopefully this would make it easier for more plugins to be totally pure.

That said, for testing, I think long-term, hopefully we can make the puppeteer-style testing story easier so not as much needs to be mocked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, a separate issue from this is having puppeteer style tests with access to app, but still I think this is useful even when we have support for that.

@lhorie
Copy link
Contributor

lhorie commented Aug 1, 2018

JSON

So I assume Dates are out of scope? Have you considered something like BSON?

@ganemone
Copy link
Contributor Author

ganemone commented Aug 1, 2018

So I assume Dates are out of scope? Have you considered something like BSON?

I would rather have no dependencies needed by default. However, since this is in the DI system, it would be easy for an engineer to replace the serialization mechanism with BSON if they want.

However, I would say just use timestamps rather than date objects to simplify things

return (ctx, next) => {
if (__BROWSER__) {
// load and parse serialized data from dom node
const data = deserialize();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you do with the resulting data in this case?

How do you make it available to other parts of the app? (either redux state or react context)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would still be done via react context

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referencing the fact that data is assigned to a variable but then nothing is done with it. Can you add examples for redux (I know you have one later down) and React context?


# Motivation

It is a very common pattern in fusion to have some shared state between the server and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What types of shared data are we expecting to support? Should you expect to serialize your entire redux state tree? Should you use it for storing preferences? Or how agnostic should it be to your data type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default this will just support JSON serializable data, but since the serializer/deserializer is in the DI system, you can basically add support for any data type you want.

const ImmutableSharedState = (id) => {
return {
serialize: (ctx, data) => {
ctx.template.body.push(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you writing to and reading from HTML directly? I'd rather have the framework abstract this away for me.

I understand you want an escape hatch, but I think the default case should be that the framework sets the return value of this function in the DOM, and then on the browser is responsible for trying to find that element and passing its results as an argument to deserialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default the framework handles writing to html / dom. It is only in the case where you want custom serialization / deserialization when you need to do this manually.

The reason I think we need the escape hatch to include html/dom APIs is it could be dependent on your serialization mechanism. For example, you might want to serialize it into an inline js script rather than a json script tag.


There are many variations of this API we should consider. One potential option would be to have a `ctx.serialize` and `ctx.deserialize` function,
and provide a way to mock those functions in testing via something in `fusion-test-utils`. This could solve the issue with testing, but would
require introducing new APIs to `fusion-test-utils`. While the above proposal does introduce new Tokens, it uses the DI system to solve the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this solution instead of overloading ctx further

'some-id': {
some: 'json'
},
'other-data': {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be deserialize(serialize(mockValue)) or just mockValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the mockValue


```js
import {MockSharedState, SharedStateToken} from 'fusion-core';
// Deserialize calls will now return the mock values for these ids
Copy link
Contributor

@rtsao rtsao Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a couple aspects of this particular explanation aren't totally clear to me

  • What's the difference between setting the shared state, setting the default shared state, and mocking the shared state, and overriding the shared state? Is this API only used for testing? What happens when it's used normally? I think hopefully we could avoid using "mock" as part of the API if it's not necessarily testing-specific.
  • Does this change which "deserialize" function is called, or merely change the values on which deserialize operates on? Part of this explanation seems to imply that deserialize is mocked, but the name of the actual function makes it seem like only the itself data is being changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea open for suggestions on the naming here. I actually originally put the MockSharedState (or w/e we call it) into fusion-test-utils because it really should only be used for testing. It is really just a helper for mocking deserialize. Maybe it could mock serialize also, but I think that will be less useful. But that comment I think describes it pretty well:

Deserialize calls will now return the mock values for these ids

It does just that.


This can make testing difficult, especially when running browser tests since there is not
an easy way to mock the deserialization other than each plugin exposing their own deserialization
token.
Copy link
Contributor

@rtsao rtsao Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this solution that bad (i.e. just having a token for each plugin)? I guess it depends on a few factors

  • How frequently you want to mock the "universal" states for multiple separate plugins
  • How much boilerplate/docs is required

However, I think there are some potential advantages of per-plugin state mock/override tokens:

  • Better type safety? I'm not totally sure but it seems like this might be difficult to type
  • No need for string IDs (no typos)
  • Plugins can't have implicit dependencies on other plugin's shared state (using string ids)
import {FooStateToken} from "fusion-plugin-foo";
import {BarStateToken} from "fusion-plugin-bar";
import {BazStateToken} from "fusion-plugin-baz";

// in tests
app.register(FooStateToken, /* something */);
app.register(BarStateToken, /* something */);
app.register(BazStateToken, /* something */);

I suppose this is more verbose, but could be remedied:

import {FooStateToken} from "fusion-plugin-foo";
import {BarStateToken} from "fusion-plugin-bar";
import {BazStateToken} from "fusion-plugin-baz";

function mockState(app, state) {
  app.register(FooStateToken, state.foo);
  app.register(BarStateToken, state.bar);
  app.register(BazStateToken, state.baz);
}

// in tests
mockState(app, {foo, bar, baz});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think what you are proposing is correct or possible. Maybe you are missing some context from the RFC. The user doesn't provide the data, the plugin provides the data. So the data itself cannot come from the token. Maybe we should have a quick chat IRL to go over the doc

@ganemone
Copy link
Contributor Author

ganemone commented Aug 6, 2018

Chatted with @rtsao offline, going to wait on this RFC until we put together a separate RFC for createUniversalSymbol

@rtsao
Copy link
Contributor

rtsao commented Aug 10, 2018

I was thinking about making an RFC for "universal env vars" but realized this is basically the same problem.

I think we can consider this issue in terms of three degrees of "universal" state:

  1. Build-time values (e.g. static values in code or generated values at build-time like createUniversalSymbol)
  2. Deployment startup-time values (e.g. environment variables)
  3. Request-time values (e.g. arbitrary state)

For deployment-time values (such as env vars) a convenient API might look something like:

import {universalEnv} from "fusion-core";

const cdnURL = universalEnv("CDN_URL"); // build-time enforcement that param is string literal

I think the takeaway here is consumers of the above API probably shouldn't have to interact with context at all because the values are invariant between requests. That said, it seems like at least part of the underlying implementation should be shared because it's conceptually the same thing, just sort of a simplified case where state doesn't vary between requests.

It would be nice to have a unifying "universal transport" abstraction here. Perhaps one way to aligning these is if there was some default value which is context is omitted.

Here's an idea loosely inspired by the React.createContext API:

import {createUniversalState} from "fusion-core";

const {getState, setState} = createUniversalState("some unique id", "a default value");

getState();
// => "a default value"

// Or within a middleware:
setState(ctx, "some new value");

getState(ctx);
// => "some new value"

Some of the implementation details are still a bit fuzzy but I think this would be interesting idea to explore.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants