Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Serialize in-memory attributes into deep DOM clone #208

Merged
merged 24 commits into from
May 31, 2019

Conversation

Robdel12
Copy link
Contributor

@Robdel12 Robdel12 commented May 23, 2019

Small background

Percy Agent is the underlying package that powers most of our SDKs now. It provides a CLI to handle creating the builds (among other features), a node process for processing the DOM, a way to capture assets, and provides the JS we use to actually capture the DOM from the customer's browser.

There is some state within the page that we need to capture but aren't present in the DOM we take from the browser (form element values, CSS Object Model, Canvas elements, etc).

In order for us to properly capture all of the state within the customer's browser, we need to serialize those values into the DOM we are capturing.

The problem

At this time we're modifying the DOM within the customer's browser to capture that state. This can be harmful to their applications state. You can see how this works now here:

private domSnapshot(documentObject: Document): string {
const doctype = this.getDoctype(documentObject)
const dom = this.stabilizeDOM(documentObject)
let domClone = dom.cloneNode(true) as HTMLElement
// Sometimes you'll want to transform the DOM provided into one ready for snapshotting
// For example, if your test suite runs tests in an element inside a page that
// lists all yours tests. You'll want to "hoist" the contents of the testing container to be
// the full page. Using a dom transformation is how you'd acheive that.
if (this.domTransformation) {
domClone = this.domTransformation(domClone)
}
const snapshotString = doctype + domClone.outerHTML
return snapshotString
}
private getDoctype(documentObject: Document): string {
return documentObject.doctype ? this.doctypeToString(documentObject.doctype) : this.defaultDoctype
}
private doctypeToString(doctype: DocumentType): string {
const publicDeclaration = doctype.publicId ? ` PUBLIC "${doctype.publicId}" ` : ''
const systemDeclaration = doctype.systemId ? ` SYSTEM "${doctype.systemId}" ` : ''
return `<!DOCTYPE ${doctype.name}` + publicDeclaration + systemDeclaration + '>'
}
private stabilizeDOM(doc: HTMLDocument): HTMLElement {
let stabilizedDOM = doc
stabilizedDOM = serializeCssOm(stabilizedDOM)
stabilizedDOM = serializeInputElements(stabilizedDOM)
// more calls to come here
return stabilizedDOM.documentElement
}

First, we take the DOM that's passed from the snapshot method and serialize those in-memory values into the customers DOM. Then we take a clone of it after we've modified it.

There are two main problems with the current approach:

  • It modifies the DOM within their app, which can influence their test suite and applications state.
  • For CSSOM specifically, it breaks the way the app handles styles as soon as we serialize the styles into the ownerNode

The fix

We can solve the two issues listed above by cloning the DOM that's passed from the snapshot method as soon as we get it and then serializing the in-memory values into the DOM clone. This would require us to change how the current system works by passing both the original DOM and the clone that we've taken.

Approach

Isolate DOM transforms into a single class

This PR isolates all of the DOM transformations we make to a single DOM class & pulls the two different serialize-*.ts files into that class. This will make it easy to contain, manage, and document each DOM transformation that happens. The steps that we take to capture this information are a little confusing and nuanced. I’d like to make it as clear as possible the type of transforms that are happening.

This will also make testing MUCH easier. It will also make it possible for us to store any future errors we want to log on the class, and then pass that class along as debug output.

Inputs

To capture the values from the inputs we will still need to modify their DOM but it's only adding a Percy data attribute with a guid. This is because we need to be able to locate the various inputs in the original DOM and serialize them into the DOM clone. We cannot rely on their selectors or attributes. It also must be done before the clone is taken so the guid is present and the consistent in both DOM trees.

I believe this is as non-invasive as we can get with this approach.

CSSOM

There are two small changes to get stable CSSOM serialization across snapshots:

  • Serialize the CSSOM from the original document into the DOM clone
  • Create a style tag and append it to the same parentNode of the CSSOM’s ownerNode. We do not want to serialize these styles into the CSSOM’s ownerNode because it will break how the various libraries modify the CSSOM.

We have learned in the past that removing styles that were serialized into the CSSOM’s ownerNode will call deleteRule() in the CSSOM, which removes all styles present in the application (while also breaking the library that generates the styles). We want to avoid adding or changing any styles that the CSSOM owns.

Test coverage

This needs to be thoroughly tested. We should make sure:

  • We only modify the DOM to add the guid data attrs to form elements
  • We properly capture the CSSOM (in many snapshots, since that’s been an issue. Good repo here) (Here's a branch with this version of agent running that fixes the issues: https://github.com/Robdel12/percy-puppeteer-emotion/tree/ml-try-prod)
    • We do not serialize the CSSOM when JS is enabled (since that will be generated when the snapshot is rendered in our env)
  • We properly capture input values
  • CSSOM is not broken by adding more rules after clone
  • snapshotString() always returns a string with a doctype
  • domTransformation works as expected
  • domTransformation errors are captured and do not error the whole suite

Open TODOs

I'm not sure any of these are blocking, but could be convinced they are.

  • Add better typings than any.
  • Append the new style tag to the same parentNode as the CSSOM's style ownerNode

@Robdel12 Robdel12 force-pushed the rd/2813-serialize-to-dom-clone branch from c64ebe5 to 37b82c5 Compare May 28, 2019 19:49
Copy link
Contributor Author

@Robdel12 Robdel12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a few TODOs and helpful comments

src/percy-agent-client/dom.ts Outdated Show resolved Hide resolved
* that only lives in memory (not an asset or in the DOM).
*
*/
private serializeCssOm(documentClone: any) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to do some perf testing here (re #188). I think we're going to have to bite the perf cost here of iterating through each in-memory stylesheet each time.

Maybe in the future we can do some caching. Feels dangerous though, I'd rather capture the CSSOM each time to be 100% sure it's correct.

$style.type = 'text/css'
$style.setAttribute('data-percy-cssom-serialized', 'true')
$style.innerHTML = serializedStyles
// TODO, it'd be better if we appended it right after the ownerNode in the clone
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something that's nice to have, but not critical for this PR to get through.

expect(domSnapshot).to.contain('type="checkbox" checked')
expect(domSnapshot).to.contain('type="radio" checked')
expect(domSnapshot).to.contain('test textarea value')
const $ = cheerio.load(domSnapshot)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheerio makes working with the string of the DOM much easier by giving a jQuery API to transverse the string. This was needed because the contain method checks to make sure that exact string matches. So the order of type="radio" checked needs to be exact (which it isn't) to pass.

Cheerio allows us to query for that element in the DOM and check its attributes.

test/percy-agent-client/dom.test.ts Show resolved Hide resolved
@Robdel12 Robdel12 marked this pull request as ready for review May 29, 2019 19:36
* our API.
*
*/
snapshotString(): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd wondering about being more specific here domSnapshotString or some other name - to avoid possible confusion with Percy's Snapshots. I.e. in the near future we think each Percy Snapshot may have multiple DOM Snapshots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name domSnapshotString.

Though, it is on the DOM class:

let dom = new DOM(document);

// this
dom.snapshotString()
// vs
dom.domSnapshotString()

I don't have a strong feel either way

* This will change the customers DOM and have a possible impact on the
* customers application.
*
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May also want to include comments above saying currently we're only using it to add data-percy-element-id's to inputs which should be safe. Also that this mutation can apply to the same original DOM multiple times within one test/snapshot set. This can be dangerous, which is why we suggest strictly minimizing changes made here.

@Robdel12 Robdel12 force-pushed the rd/2813-serialize-to-dom-clone branch from 3fc3ad7 to e9b29aa Compare May 30, 2019 22:23
These methods are already mutating the cloned DOM. There's no need to capture it
in a variable and return it from each stabilize method. It gives the illustion
that it's not causing side-effects, but it really is. Let's make that explicit
@djones djones mentioned this pull request May 31, 2019

it('serializes the input value', () => {
expect($domString('#name').attr('value')).to.equal('Bob Boberson')
expect(dom.clonedDOM.querySelector('#name').attributes.value.value).to.equal('Bob Boberson')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expect(dom.clonedDOM.querySelector('#name').attributes.value.value).to.equal('Bob Boberson')
expect(dom.clonedDOM.querySelector('#name').getAttribute('value')).to.equal('Bob Boberson')

Copy link
Contributor

@djones djones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍍 LGTM

@Robdel12 Robdel12 merged commit 56d76bd into master May 31, 2019
@delete-merged-branch delete-merged-branch bot deleted the rd/2813-serialize-to-dom-clone branch May 31, 2019 20:44
djones pushed a commit that referenced this pull request May 31, 2019
## [0.5.1](v0.5.0...v0.5.1) (2019-05-31)

### Bug Fixes

* Serialize in-memory attributes into deep DOM clone ([#208](#208)) ([56d76bd](56d76bd))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants