Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Preserve formatting #2444

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

kriskowal
Copy link
Member

Closes: #926

Description

This change leverages new Babel support for format preservation.

Security Considerations

None

Scaling Considerations

None

Documentation Considerations

None

Testing Considerations

This change includes a test to perform a narrow spot-check of the verbatim output of a module that should be largely preserved.

Compatibility Considerations

None

Upgrade Considerations

None

  • NEWS.md

inputSourceMap: sourceMap,
retainLines: true,
preserveFormat: true,
compact: true,
Copy link
Member Author

@kriskowal kriskowal Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolo-ribaudo Reviewing https://github.com/babel/babel/pull/16708/files#diff-ca2bda59eec9c35846e26c1c6247759c92c26357b80d4dc881fcdaf12df1d7e6R35-R39, it seems like compact: true here should be causing your code to throw, so I suspect this isn’t exercising the new path in my local testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it. Needed to swap out our fork @agoric/babel-generator back to @babel/generator.

@kriskowal kriskowal force-pushed the kriskowal-babel-preserve-format-integration branch from f751a41 to 1fef37d Compare September 3, 2024 23:57
@kriskowal
Copy link
Member Author

@nicolo-ribaudo,

I’ll have a push soon, but this looks correct, actually:

Pre-transform:

function TokenType() {}
const beforeExpr = 0;

export function createBinop(name, binop) {
  return new TokenType(name, {
    beforeExpr,
    binop,
  });
}

Post-transform:

function TokenType() {}
const beforeExpr = 0;

       function createBinop(name, binop) {
  return new TokenType(name, {
    beforeExpr,
    binop,
  });
}

I’ll note that there’s a weird but explicable offset on the function declaration because we have erased export!

This was the previous (undesirable) effect:

function TokenType() { }
const beforeExpr=  0;

function        createBinop(name, binop) {
  return new TokenType(name, {
    beforeExpr,
    binop});

 }
})()

And the test I’ve proposed is not yet passing because the generator doesn’t add a newline at the end of the file. That presumably means that it needs to catch up with the final line and column of the file to preserve trailing whitespace, or just add a single newline to the end to make sure the output is a valid UNIX text file.

@kriskowal kriskowal force-pushed the kriskowal-babel-preserve-format-integration branch 2 times, most recently from 2c8df48 to ff89345 Compare September 4, 2024 00:03
@kriskowal kriskowal force-pushed the kriskowal-babel-preserve-format-integration branch from ff89345 to e8ebd67 Compare September 4, 2024 00:06
@kriskowal
Copy link
Member Author

I’m mistaken above about the newlines issue. I’m updating the test fixture and it looks like the test will pass as-is, no changes needed from Babel.

@@ -0,0 +1,10 @@
({ imports: $h͏_imports, liveVar: $h͏_live, onceVar: $h͏_once, importMeta: $h͏____meta, }) => (function () { 'use strict'; $h͏_imports([]);Object.defineProperty(createBinop, 'name', {value: "createBinop"});$h͏_once.createBinop(createBinop); // deliberately offset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually surprised by the spaces here. My expectation was that this would have been printed as compact as possible to increase the chances that the next node could be printed in the right place.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. This code with spaces is generated by you with string concatenation after that Babel runs, and not by Babel itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I also find the spacing arbitrary, emerging from some unnecessary space in the formatting of the code. I’ve pushed a commit to make that compact as expected, regardless of if not being generated by Babel.

@@ -0,0 +1,10 @@
({ imports: $h͏_imports, liveVar: $h͏_live, onceVar: $h͏_once, importMeta: $h͏____meta, }) => (function () { 'use strict'; $h͏_imports([]);Object.defineProperty(createBinop, 'name', {value: "createBinop"});$h͏_once.createBinop(createBinop); // deliberately offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the closing paren for the function expression here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm it should be at then end of the file right? It looks like I forgot to flush some tokens somewhere 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, cloning the PR locally I get this additional line at the end of the file:

})()

Probably @kriskowal accidentally deleted it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the fix for this. It was present locally when I got the test to pass.

@kriskowal kriskowal force-pushed the kriskowal-babel-preserve-format-integration branch 2 times, most recently from 31db01e to e04e72d Compare September 5, 2024 00:25
@kriskowal kriskowal force-pushed the kriskowal-babel-preserve-format-integration branch 2 times, most recently from 06d68e6 to be5db4d Compare October 26, 2024 03:19
@kriskowal kriskowal force-pushed the kriskowal-babel-preserve-format-integration branch 2 times, most recently from b2a3b1a to fdc64cb Compare November 5, 2024 19:49
@nicolo-ribaudo
Copy link

From the failure it seems like there is still a bug in Babel. I probably need to enforce semicolons in case where the printer doesn't put them, so that even if the next token is on the same line it works.

@kriskowal kriskowal force-pushed the kriskowal-babel-preserve-format-integration branch from fdc64cb to 7fe88bb Compare November 5, 2024 20:08
@kriskowal
Copy link
Member Author

From the failure it seems like there is still a bug in Babel. I probably need to enforce semicolons in case where the printer doesn't put them, so that even if the next token is on the same line it works.

I’ve got good results for @endo/module-source and @endo/evasive-transform in their tests, which are not comprehensive. The remaining error is in composition with a sourcemap and rollup, and I need to spend more time isolating it. I’m not yet sure it’s a problem with Babel, but the problem does go away if we use compact instead of preserveFormat for the relevant case.

@nicolo-ribaudo
Copy link

I found the root cause of the error, it's indeed a Babel bug.

// input
const results = {
    answer1,
  answer2,
  answer3,
        answer4,
  answer5,
}
// run babel
const out = babel.transformSync(code, {
  configFile: false,
  parserOpts: { tokens: true },
  generatorOpts: { retainLines: true, experimental_preserveFormat: true },
  plugins: [
    ({ template }) => ({
      visitor: {
        Program(path) {
          path.pushContainer("body", template.statement.ast`hello;`);
        },
      },
    }),
  ],
});

The VariableDeclaration is printed without a trailing semicolon to preserve its original format, but then the new injected hello; is printed on the same line as } (to avoid increasing the line number), causing a syntax error.

@nicolo-ribaudo
Copy link

babel/babel#16958

kriskowal added a commit that referenced this pull request Nov 12, 2024
Closes: #2415 

## Description

This change introduces support for TypeScript through type-erasure,
using ts-blank-space, which converts type annotations to equivalent
blank space. As is consistent with `node --experimental-strip-types`,
this only applies to modules with the `.ts`, `.mts`, or `.cts`
extensions in packages that are not under `node_modules`, to discourage
publishing TypeScript as a source language to npm.

### Security Considerations

The choice of `ts-blank-space` is intended to minimize runtime behavior
difference between TypeScript and JavaScript, such that a reviewer or a
debugger of the generated JavaScript aligns with the expected behavior
and original text, to the extent that is possible. This should compose
well with #2444.

### Scaling Considerations

None.

### Documentation Considerations

Contains README and NEWS.

### Testing Considerations

Contains spot check tests for TypeScript in the endoScript and
endoZipBase64 formats. We stand on much more rigorous testing of the
underlying workspace-language-for-extension feature in Compartment
Mapper #2625.

### Compatibility Considerations

This does not break any prior usage.

### Upgrade Considerations

None.
@kriskowal kriskowal force-pushed the kriskowal-babel-preserve-format-integration branch from 8baf9f2 to 011fb38 Compare December 4, 2024 22:33
@kriskowal
Copy link
Member Author

We seem to still have some defects in a similar vein. We have a failing test that exists to ensure that error stacks report true line numbers when using one of our older bundle formats.

The source is:

export const message = `You're great!`;
export const makeError = msg => Error(msg);
// Without an evasive transform, the following comment will trip the SES censor
// for dynamic imports. */
/** @type {import('./types.js').EncourageFn} */
export const encourage = nick => `Hey ${nick}!  ${message}`;

The current transform output preserves the line number.

(function getExport(require, exports) {   'use strict';   const module = { exports };     'use strict';Object.defineProperty(exports,'__esModule',{value:true});const message=`You're great!`;
const makeError=(msg)=>Error(msg);
/* Without an evasive transform, the following comment will trip the SES censor*/
/* for dynamic imports. *X/*/
/** @type {IMPORT('./types.js').EncourageFn} */
const encourage=(nick)=>`Hey ${nick}!  ${message}`;exports.encourage=encourage;exports.makeError=makeError;exports.message=message;
  return module.exports;
})
//# sourceURL=/bundled-source/.../encourage.js

The new transformed output injects a lot of unexpected space:

(function getExport(require, exports) {   'use strict';   const module = { exports };     'use strict';

Object.defineProperty(exports,'__esModule' , { value: true });

const                                                                                                                                                                                                                                                                                                                                                                                                                                                                message=`You're great!`;
const                                                                                                                                                                                                                                                                                                                                                                                                                                                                makeError=(msg)=>Error(msg);/* Without an evasive transform, the following comment will trip the SES censor*//* for dynamic imports. *X/*//** @type {IMPORT('./types.js').EncourageFn} */



const                                                                                                                                                                                                                                                                                                                                                                                                                                                                encourage=(nick)=>`Hey ${nick}!  ${message}`;

exports.                                                                                                                                                                                                                                                                                                                                                                                                                                                                encourage=encourage;
exports.                                                                                                                                                                                                                                                                                                                                                                                                                                                                makeError=makeError;
exports.                                                                                                                                                                                                                                                                                                                                                                                                                                                                message=message;
  return module.exports;
})
//# sourceURL=/bundled-source/.../encourage.js

This is the only failing test locally, but is also the only test that is sensitive to stack trace line numbers, so I should make more tests and also attempt to isolate this more. You will note that is a Babel transform that occurs after a prior Rollup transform, which is not present in our modern bundle formats.

@nicolo-ribaudo
Copy link

I'll take a look. I'm also working on getting a large part of our transform tests to go through this printer to have a bigger corpus of test cases.

That's a funny output, I wonder how it's deciding to inject all those spaces there.

@nicolo-ribaudo
Copy link

I started taking a look at that. Something I noticed is that if I disable the "source map unmap" functionality you have in transformAst, then the output looks ok.

It's not surprising that it has an effect, given that Babel relies on location info to preserve the format, but I don't know yet if it's a bug in Babel on in how you remap locations.

@nicolo-ribaudo
Copy link

nicolo-ribaudo commented Dec 6, 2024

Ok so what's happening is that this is the input code parsed with Babel:

'use strict';

Object.defineProperty(exports, '__esModule', { value: true });

const message = `You're great!`;
const makeError = msg => Error(msg);
// Without an evasive transform, the following comment will trip the SES censor
// for dynamic imports. */
/** @type {import('./types.js').EncourageFn} */
const encourage = nick => `Hey ${nick}!  ${message}`;

exports.encourage = encourage;
exports.makeError = makeError;
exports.message = message;

And Babel is trying to preserve the formatting of this input code (and not of what the input code for Rollup was, since Babel doesn't know about it). That's why there are all those "extra" newlines.

You need to also re-map the locations .tokens array in the AST, since that's what Babel looks at to understand where the original tokens where in the original code. Or if you can, it would be even better to actually parse the original pre-rollup code and give those tokens and that original source code to @babel/generator (while still using the post-rollup AST with remapped locations).

The "fake locations" you generate are weird btw. In the test above, in const message=`You're great!`;:

  • const message = `You're great!`; has start and end loc 1:12
  • message = `You're great!` has start and end loc 1:13

The extra horizontal space is due to a failing check in Babel. I was assuming it would have never happen, but it's happening due to the mismatch between the locations on the AST nodes and the locations on the tokens. If after fixing the tokens locations in your logic the problem persists, I can adjust the check in Babel.

@kriskowal
Copy link
Member Author

Thank you @nicolo-ribaudo, this implies a couple good mitigation options I can look into and I believe we’re unblocked.

@michaelfig My intention is to look into making Babel’s new preserveFormat mode work with the older getExport and nestedEvaluate in composition with Rollup by ensuring we return to text between stages. If that exceeds my (short) timebox, my intention is to just disable preserveFormat and live with the current experience. My intention is to (soon) migrate our remaining usage of getExport and nestedEvaluate to the new endoScript format, and maybe even introduce an endoNativeScript format to take advantage of XS Compartment and further improve debug experience for kernel bootstrap scripts.

@kriskowal
Copy link
Member Author

I’ve spoken to @michaelfig and we resolved that we have reimplemented enough of Rollup at this stage to pretty easily replace the internals of the legacy getExport and nestedEvaluator generators with ones we trust. So, we will not attempt to get Rollup and Babel to play well together. The end result is that this test will pass without modification, since it won’t be downgraded from ESM to CJS by Rollup, but will go through our ModuleSource shim instead, which will benefit from preserveFormat.

@nicolo-ribaudo
Copy link

Perfect -- if you'll need anything else you know where to find me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lots of whitespace differences introduced by endo bundler
3 participants