Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dependency-extraction-webpack-plugin: Calculate vendor hash from file output rather than Webpack internal state #34969

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

- Use OpenSSL provider supported in Node 17+ when calling `crypto.createHash` ([#40503](https://github.com/WordPress/gutenberg/pull/40503)).
- Add new line at the end of generated `*.asset.php` files ([#40753](https://github.com/WordPress/gutenberg/pull/40753)).
- Calculate version hashes based on output file contents rather than input files and other Webpack internal state ([#34969](https://github.com/WordPress/gutenberg/pull/34969)).

## 3.3.0 (2022-01-27)

Expand Down
15 changes: 13 additions & 2 deletions packages/dependency-extraction-webpack-plugin/lib/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ class DependencyExtractionWebpackPlugin {
name: this.constructor.name,
stage:
compiler.webpack.Compilation
.PROCESS_ASSETS_STAGE_ADDITIONAL,
.PROCESS_ASSETS_STAGE_ANALYSE,
},
() => this.addAssets( compilation, compiler )
);
Expand Down Expand Up @@ -198,14 +198,25 @@ class DependencyExtractionWebpackPlugin {
}
}

// Go through the assets and hash the sources. We can't just use
// `entrypointChunk.contentHash` because that's not updated when
// assets are minified. Sigh.
// @todo Use `asset.info.contenthash` if we can make sure it's reliably set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that it could get fixed on the webpack side? Is there an issue open that describes the same use case and could be included as a reference so this todo item can be revisited later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not what it means.

What this comment is referring to is that sometimes Webpack sets .info.contenthash on the asset object, but it only does so if it decides that it needs it (if I recall correctly, it depends on whether the filename template uses [contenthash]). If we could ensure that Webpack sets that every time, we could just use it instead of hashing the file contents ourself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think the algorithm you proposed is a nice improvement so let's rephrase the comment so it presents the benefits, instead of the limitations of the webpack. I can fully confirm that it's going to solve the issue that Jetpack struggles with as explained by @kraftbj with hashes changing depending on the absolute path for files. We can also link this PR so people can learn the full context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection to changing the comment, but I don't know what to rephrase it to. Care to make a suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, ideally we would just change (line 208):

version: entrypointChunk.hash

to:

version: entrypointChunk.contenthash

and be done? But we can't do that because webpack doesn't always set it? And/or the value doesn't have the properties that we need?

I noticed that webpack has an optimization.realContentHash config option. Which means that there are multiple valid ways how to compute contenthash, and that the default ways is in some sense "not real."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the goal to use the same hash as the real [contenthash], and we are only forced to calculate it ourselves, they should match. I.e., if my webpack config uses a [name].[contenthash].js filename template, and also uses the extraction plugin, the [contenthash] and version should be the same. But they currently aren't. The differences I see are:

  1. webpack uses md4 algorithm by default, configurable with output.hashFunction option, while the plugin uses sha512.
  2. webpack uses only the assets' contents to update the hash, while we hash the filename, too.
    It's weird that even making these two changes, the hashes are still different for me.

It would be worthwhile to spend some timeboxed time trying to fix this before merging. Other than that, I think this patch is ready to land.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

md4 no longer works with Node 17+.

Replicating the same algorithm as webpack uses internally is rather complex. This is the initial commit when the support for contentHash was added:
webpack/webpack@b929d4c.

Maybe we should try to detect contentHash first, and fall back to the custom handler otherwise. Example contentHash objects:

{
  'css/mini-extract': '9dc6bf4629f53268df7c',
  javascript: 'bc28cb02479bf7449a77'
}
{ javascript: '684ed11ffa88cd017286' }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

md4 no longer works with Node 17+.

webpack ships its own WASM module that implements md4. Instead of importing createHash from Node's crypto module, we can use the webpack version:

const webpack = require( 'webpack' );
const hash = webpack.util.createHash( 'md4' );

Replicating the same algorithm as webpack uses internally is rather complex.

The RealContentHashPlugin source is indeed very complex, and I'm not sure what set of possible use cases it covers, but I've been able to replicate the contenthash quite easily:

const hash = createHash( 'md4' );
// asset.source.updateHash( hash );
hash.update( asset.source.buffer() );
const version = hash.digest( 'hex' );

Here the non-obvious step is to not use asset.source.updateHash, because that hashes a string "RawSource" + source, instead of just source. That's all the difference and when computing a content hash, we only want the source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the non-obvious step is to not use asset.source.updateHash, because that hashes a string "RawSource" + source, instead of just source. That's all the difference and when computing a content hash, we only want the source.

Looking at it another way, we don't actually care what exactly goes into the hash as long as changes to the output result in changes to the hash, and non-changes to the output do not result in changes to the hash. Whether or not "RawSource" is incorporated in the hash makes no difference from that perspective.

OTOH, using asset.source.updateHash may have a performance advantage if the source is something like ReplaceSource or ConcatSource that does non-trivial work in source() or buffer().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't actually care what exactly goes into the hash as long as changes to the output result in changes to the hash, and non-changes to the output do not result in changes to the hash.

I believe that my suggestions make an actual observable improvements to this behavior. I stopped hashing the file name, so renaming the file won't change the hash. In the end, the loading URL is going to be like script.js?ver=hash, where name change doesn't require a hash change to load the right asset.

Also, incorporating internal structure like RawSource tags changes the hash when webpack internals change, even if the content remains the same. It seems to me that .updateHash is more fit for internal purposes, not for calculating contenthash.

The performance advantage, I'm afraid it will never materialize. At some point webpack will write the asset to a file, and will call source.buffer() to construct the buffer to write. When calculating contenthash we'll just just construct the buffer a bit earlier, at a very late compilation stage where the source is unlikely to change further.

const hash = createHash( 'sha512' );
for ( const filename of entrypoint.getFiles().sort() ) {
const asset = compilation.getAsset( filename );
anomiex marked this conversation as resolved.
Show resolved Hide resolved
hash.update( `${ filename }: ` );
gziolo marked this conversation as resolved.
Show resolved Hide resolved
asset.source.updateHash( hash );
}

const entrypointChunk = isWebpack4
Copy link
Member

@gziolo gziolo May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now move entrypointChunk constant further in the code and closer to the usage. Disregard this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we probably could get files from the entrypointChunk.files rather than from entrypoint.getFiles() to align with the handling in other places:

const entrypointChunk = isWebpack4
	? entrypoint.chunks.find( ( c ) => c.name === entrypointName )
	: entrypoint.getEntrypointChunk();

const entrypointChunkHash = createHash( 'sha512' );
for ( const filename of Array.from( entrypointChunk.files ).sort() ) {
	entrypointChunkHash.update( filename + ':' + compilation.getAsset( filename ).source.source() );
}

Copy link
Member

@gziolo gziolo May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In @wordpress/scripts there is this logic:

splitChunks: {
cacheGroups: {
style: {
type: 'css/mini-extract',
test: /[\\/]style(\.module)?\.(sc|sa|c)ss$/,
chunks: 'all',
enforce: true,
name( _, chunks, cacheGroupKey ) {
const chunkName = chunks[ 0 ].name;
return `${ dirname(
chunkName
) }/${ cacheGroupKey }-${ basename( chunkName ) }`;
},
},
default: false,
},
},

With the current implementation it processes 3 files:

[ './style-index.css', 'index.css', 'index.js' ]

Array.from( entrypointChunk.files ) would process only 2 files because ./style-index.css goes to its own chunk:

[ 'index.css', 'index.js' ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still suggesting to change the implementation here as in your second comment, or is the third comment convincing yourself not to?

It seems to me that your last comment is the argument not to make the change you suggested in your second comment. The extra file is part of the asset and so should be covered by the hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last comment provides the reasoning why the change included in #34969 (comment) would be a good improvement. We don't care about ./style-index.css that goes into its chunk so it shouldn't matter when calculating the hash for the JS entry point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you care if I point out that someone using .optimization.runtimeChunk would find that the runtime.js or runtime~index.js isn't included in the version hash either?

If you really want this change I'm not going to fight it since our use case has neither of these sorts of extra files. But I do think if we're going to have one hash for the asset, it should cover the whole asset rather than excluding parts of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the possibility that someone is using splitChunks like you do there but on JS code chunks, or with Webpack's automatic vendor splitting.

? entrypoint.chunks.find( ( c ) => c.name === entrypointName )
: entrypoint.getEntrypointChunk();

const assetData = {
// Get a sorted array so we can produce a stable, stringified representation.
dependencies: Array.from( entrypointExternalizedWpDeps ).sort(),
version: entrypointChunk.hash,
version: hash.digest( 'hex' ).substring( 0, 32 ),
};

const assetString = this.stringify( assetData );
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`DependencyExtractionWebpackPlugin Webpack \`combine-assets\` should produce expected output: Asset file 'assets.php' should match snapshot 1`] = `
"<?php return array('fileA.js' => array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'dd2fe63dd2d581e01ace6923ea5f1150'), 'fileB.js' => array('dependencies' => array('wp-token-list'), 'version' => '72d30a2459a6c29ccbc8bc4a8c6641b7'));
"<?php return array('fileA.js' => array('dependencies' => array('lodash', 'wp-blob'), 'version' => '3e34cc9f8b5062d43bb05d71a9865ab1'), 'fileB.js' => array('dependencies' => array('wp-token-list'), 'version' => '59ef1b47eaac81d79d740e4357df7209'));
"
`;

Expand Down Expand Up @@ -32,7 +32,7 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`dynamic-import\` should produce expected output: Asset file 'main.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => '52a95452a51ae14be315bbac91fd66bf');
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'f210c845cbb975aa12ca53703279950b');
"
`;

Expand All @@ -55,7 +55,7 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`function-output-filename\` should produce expected output: Asset file 'chunk--main--main.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'c44010df32f758565726bcefbf69a28b');
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => '978b7e0c7b7f291c4b16b534a7a431a3');
"
`;

Expand All @@ -78,7 +78,7 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`has-extension-suffix\` should produce expected output: Asset file 'index.min.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => '7e289f109b13dd69d9a1097f90bcfeb2');
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'd57351543a76eca838d184863322b856');
"
`;

Expand All @@ -101,21 +101,21 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`no-default\` should produce expected output: Asset file 'main.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array(), 'version' => '90ac5f67b465feaec264ee3047123919');
"<?php return array('dependencies' => array(), 'version' => '6041d7727db98d5f89967189c9ac013f');
"
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`no-default\` should produce expected output: External modules should match snapshot 1`] = `Array []`;

exports[`DependencyExtractionWebpackPlugin Webpack \`no-deps\` should produce expected output: Asset file 'main.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array(), 'version' => 'e666cc803ddb8b960a12755e87d0321c');
"<?php return array('dependencies' => array(), 'version' => '86ff60df0b0f882f2a05d1a67ff97cf4');
"
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`no-deps\` should produce expected output: External modules should match snapshot 1`] = `Array []`;

exports[`DependencyExtractionWebpackPlugin Webpack \`option-function-output-filename\` should produce expected output: Asset file 'chunk--main--main.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'c44010df32f758565726bcefbf69a28b');
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => '9e10498153cf094d8c25892525d3cd6c');
"
`;

Expand All @@ -138,7 +138,7 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`option-output-filename\` should produce expected output: Asset file 'main-foo.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'c44010df32f758565726bcefbf69a28b');
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => '9e10498153cf094d8c25892525d3cd6c');
"
`;

Expand All @@ -160,7 +160,7 @@ Array [
]
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`output-format-json\` should produce expected output: Asset file 'main.asset.json' should match snapshot 1`] = `"{\\"dependencies\\":[\\"lodash\\"],\\"version\\":\\"ff689135319685f74bf813654f70c5a4\\"}"`;
exports[`DependencyExtractionWebpackPlugin Webpack \`output-format-json\` should produce expected output: Asset file 'main.asset.json' should match snapshot 1`] = `"{\\"dependencies\\":[\\"lodash\\"],\\"version\\":\\"2a8570fd30cadd4123ee368f90128519\\"}"`;

exports[`DependencyExtractionWebpackPlugin Webpack \`output-format-json\` should produce expected output: External modules should match snapshot 1`] = `
Array [
Expand All @@ -173,7 +173,7 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`overrides\` should produce expected output: Asset file 'main.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('wp-blob', 'wp-script-handle-for-rxjs', 'wp-url'), 'version' => '97c94d19d2d93c0ef60f14d590cf1204');
"<?php return array('dependencies' => array('wp-blob', 'wp-script-handle-for-rxjs', 'wp-url'), 'version' => '5620bf1846e497728408912a36d99af6');
"
`;

Expand Down Expand Up @@ -212,12 +212,12 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`runtime-chunk-single\` should produce expected output: Asset file 'a.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('wp-blob'), 'version' => 'd189640bf0bd44c2f6f9fee71b00d756');
"<?php return array('dependencies' => array('wp-blob'), 'version' => 'e575aabfcecd0c966cba4df4985af039');
"
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`runtime-chunk-single\` should produce expected output: Asset file 'b.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'dd30d1b96694d89afddbfad01a09ee4d');
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'd86c3101fe7a68a43edb695f5f897b62');
"
`;

Expand All @@ -240,7 +240,7 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`wordpress\` should produce expected output: Asset file 'main.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'c44010df32f758565726bcefbf69a28b');
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => '9e10498153cf094d8c25892525d3cd6c');
"
`;

Expand All @@ -263,7 +263,7 @@ Array [
`;

exports[`DependencyExtractionWebpackPlugin Webpack \`wordpress-require\` should produce expected output: Asset file 'main.asset.php' should match snapshot 1`] = `
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => '4f30547fb762285f57176ff70a968bbc');
"<?php return array('dependencies' => array('lodash', 'wp-blob'), 'version' => 'ad8b8843fa9f514255fde2cc0c16b48c');
"
`;

Expand Down