-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dependency-extraction-webpack-plugin: Calculate vendor hash from file output rather than Webpack internal state #34969
dependency-extraction-webpack-plugin: Calculate vendor hash from file output rather than Webpack internal state #34969
Conversation
Webpack 5 recommends using the content hash over the old-style chunk hash for cache invalidation, as the chunk hash can change even if the output does not. dependency-extraction-webpack-plugin will now include the content hash for each content type in the asset file. The `version` field is still present, now as a combined hash for all the content types reported, to maintain backwards compatibility. Fixes WordPress#34660
👋 Thanks for your first Pull Request and for helping build the future of Gutenberg and WordPress, @anomiex! In case you missed it, we'd love to have you join us in our Slack community, where we hold regularly weekly meetings open to anyone to coordinate with each other. If you want to learn more about WordPress development in general, check out the Core Handbook full of helpful information. |
Wouldn't this be a breaking change? |
What do you think it would break? |
…-dependency-extraction-webpack-plugin
@anomiex, thank you for opening this PR. For backward compatibility, we should optimize that the usage of The current behavior is also documented in the following section in the Block Editor Handbook: https://developer.wordpress.org/block-editor/reference-guides/block-api/block-metadata/#wpdefinedasset (source) I would expect that many plugins depend on that, too. What is the rationale to have a different value for |
The We could go to some extra work if we really wanted to detect the case where the only asset is JavaScript and make the two hashes be the same instead of calculating a combined hash. It doesn't seem worth the effort to me, though.
As noted in the patched readme, the JavaScript hash could be passed to On the other hand, they could just keep using |
From my perspective, it should be perfectly fine to avoid introducing |
Are you requesting changes to the PR, or just musing? |
I think that this PR needs some iterations to avoid a need for updating PHP code to use the updated hash for JS files. |
There's no need as it is now, it's fully backwards compatible. But if you want a revision to make it so there's no code update possible, that's fine with me. On the other hand, as I implemented that and re-tested I found out my code wasn't working right. Webpack calculates the chunk contentHash fields when it first processes the chunk, but then when further updates happen (e.g. minification) it only updates the hash where it appears in the assets. It turns out that it worked with the code snippet from #34660 that only looked at the JavaScript asset, but fails when we combine all the assets' hashes. |
…-dependency-extraction-webpack-plugin
…-dependency-extraction-webpack-plugin
…ependency-extraction-webpack-plugin
…ependency-extraction-webpack-plugin
@gziolo It has been five months since I made the change you requested. Care to re-review? |
I'll ask for help with the review in the parent issue. I don't think I fully understand the issue here. |
…ependency-extraction-webpack-plugin
Confirming that this PR resolves the issue that we're facing. The full narrative of why it matters to us (Jetpack):
|
We had exactly the same issue in WordPress core a year ago. @peterwilsoncc patched it by using |
Ok, it looks like It looks like the hash ( |
// Go through the assets and hash the sources. We can't just use | ||
// `entrypointChunk.contentHash` because that's not updated when | ||
// assets are minified. Sigh. | ||
// @todo Use `asset.info.contenthash` if we can make sure it's reliably set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that it could get fixed on the webpack side? Is there an issue open that describes the same use case and could be included as a reference so this todo item can be revisited later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's not what it means.
What this comment is referring to is that sometimes Webpack sets .info.contenthash
on the asset
object, but it only does so if it decides that it needs it (if I recall correctly, it depends on whether the filename template uses [contenthash]
). If we could ensure that Webpack sets that every time, we could just use it instead of hashing the file contents ourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think the algorithm you proposed is a nice improvement so let's rephrase the comment so it presents the benefits, instead of the limitations of the webpack. I can fully confirm that it's going to solve the issue that Jetpack struggles with as explained by @kraftbj with hashes changing depending on the absolute path for files. We can also link this PR so people can learn the full context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection to changing the comment, but I don't know what to rephrase it to. Care to make a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, ideally we would just change (line 208):
version: entrypointChunk.hash
to:
version: entrypointChunk.contenthash
and be done? But we can't do that because webpack doesn't always set it? And/or the value doesn't have the properties that we need?
I noticed that webpack has an optimization.realContentHash
config option. Which means that there are multiple valid ways how to compute contenthash
, and that the default ways is in some sense "not real."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have the goal to use the same hash as the real [contenthash]
, and we are only forced to calculate it ourselves, they should match. I.e., if my webpack config uses a [name].[contenthash].js
filename template, and also uses the extraction plugin, the [contenthash]
and version
should be the same. But they currently aren't. The differences I see are:
- webpack uses
md4
algorithm by default, configurable withoutput.hashFunction
option, while the plugin usessha512
. - webpack uses only the assets' contents to update the hash, while we hash the filename, too.
It's weird that even making these two changes, the hashes are still different for me.
It would be worthwhile to spend some timeboxed time trying to fix this before merging. Other than that, I think this patch is ready to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
md4
no longer works with Node 17+.
Replicating the same algorithm as webpack uses internally is rather complex. This is the initial commit when the support for contentHash
was added:
webpack/webpack@b929d4c.
Maybe we should try to detect contentHash
first, and fall back to the custom handler otherwise. Example contentHash
objects:
{
'css/mini-extract': '9dc6bf4629f53268df7c',
javascript: 'bc28cb02479bf7449a77'
}
{ javascript: '684ed11ffa88cd017286' }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
md4
no longer works with Node 17+.
webpack ships its own WASM module that implements md4
. Instead of importing createHash
from Node's crypto
module, we can use the webpack version:
const webpack = require( 'webpack' );
const hash = webpack.util.createHash( 'md4' );
Replicating the same algorithm as webpack uses internally is rather complex.
The RealContentHashPlugin
source is indeed very complex, and I'm not sure what set of possible use cases it covers, but I've been able to replicate the contenthash
quite easily:
const hash = createHash( 'md4' );
// asset.source.updateHash( hash );
hash.update( asset.source.buffer() );
const version = hash.digest( 'hex' );
Here the non-obvious step is to not use asset.source.updateHash
, because that hashes a string "RawSource" + source
, instead of just source
. That's all the difference and when computing a content hash, we only want the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the non-obvious step is to not use
asset.source.updateHash
, because that hashes a string"RawSource" + source
, instead of justsource
. That's all the difference and when computing a content hash, we only want the source.
Looking at it another way, we don't actually care what exactly goes into the hash as long as changes to the output result in changes to the hash, and non-changes to the output do not result in changes to the hash. Whether or not "RawSource"
is incorporated in the hash makes no difference from that perspective.
OTOH, using asset.source.updateHash
may have a performance advantage if the source is something like ReplaceSource or ConcatSource that does non-trivial work in source()
or buffer()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually care what exactly goes into the hash as long as changes to the output result in changes to the hash, and non-changes to the output do not result in changes to the hash.
I believe that my suggestions make an actual observable improvements to this behavior. I stopped hashing the file name, so renaming the file won't change the hash. In the end, the loading URL is going to be like script.js?ver=hash
, where name change doesn't require a hash change to load the right asset.
Also, incorporating internal structure like RawSource
tags changes the hash when webpack internals change, even if the content remains the same. It seems to me that .updateHash
is more fit for internal purposes, not for calculating contenthash
.
The performance advantage, I'm afraid it will never materialize. At some point webpack will write the asset to a file, and will call source.buffer()
to construct the buffer to write. When calculating contenthash we'll just just construct the buffer a bit earlier, at a very late compilation stage where the source is unlikely to change further.
hash.update( `${ filename }: ` ); | ||
asset.source.updateHash( hash ); | ||
} | ||
|
||
const entrypointChunk = isWebpack4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now move Disregard this one.entrypointChunk
constant further in the code and closer to the usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we probably could get files from the entrypointChunk.files
rather than from entrypoint.getFiles()
to align with the handling in other places:
const entrypointChunk = isWebpack4
? entrypoint.chunks.find( ( c ) => c.name === entrypointName )
: entrypoint.getEntrypointChunk();
const entrypointChunkHash = createHash( 'sha512' );
for ( const filename of Array.from( entrypointChunk.files ).sort() ) {
entrypointChunkHash.update( filename + ':' + compilation.getAsset( filename ).source.source() );
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In @wordpress/scripts
there is this logic:
gutenberg/packages/scripts/config/webpack.config.js
Lines 107 to 123 in 67fcf07
splitChunks: { | |
cacheGroups: { | |
style: { | |
type: 'css/mini-extract', | |
test: /[\\/]style(\.module)?\.(sc|sa|c)ss$/, | |
chunks: 'all', | |
enforce: true, | |
name( _, chunks, cacheGroupKey ) { | |
const chunkName = chunks[ 0 ].name; | |
return `${ dirname( | |
chunkName | |
) }/${ cacheGroupKey }-${ basename( chunkName ) }`; | |
}, | |
}, | |
default: false, | |
}, | |
}, |
With the current implementation it processes 3 files:
[ './style-index.css', 'index.css', 'index.js' ]
Array.from( entrypointChunk.files )
would process only 2 files because ./style-index.css
goes to its own chunk:
[ 'index.css', 'index.js' ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you still suggesting to change the implementation here as in your second comment, or is the third comment convincing yourself not to?
It seems to me that your last comment is the argument not to make the change you suggested in your second comment. The extra file is part of the asset and so should be covered by the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last comment provides the reasoning why the change included in #34969 (comment) would be a good improvement. We don't care about ./style-index.css
that goes into its chunk so it shouldn't matter when calculating the hash for the JS entry point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you care if I point out that someone using .optimization.runtimeChunk
would find that the runtime.js
or runtime~index.js
isn't included in the version hash either?
If you really want this change I'm not going to fight it since our use case has neither of these sorts of extra files. But I do think if we're going to have one hash for the asset, it should cover the whole asset rather than excluding parts of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the possibility that someone is using splitChunks like you do there but on JS code chunks, or with Webpack's automatic vendor splitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this PR extensively using the project scaffolded with npx @wordpress/create-block -t @wordpress/create-block-tutorial-template
and all the changes were applied directly to the node_modules
folder. I can confirm it works as expected and fixes the issue described by @kraftbj. I left some comments to discuss the current implementation, but overall we can land this change.
We should also update the title for this PR because it doesn't reflect the current implementation.
We maintain the CHANGELOG files for the package, and it would be great to list this change as well because it will change version
in all generated assets files with this change included:
https://github.com/WordPress/gutenberg/blob/trunk/packages/dependency-extraction-webpack-plugin/CHANGELOG.md
…ependency-extraction-webpack-plugin
contentHash
in dependency-extraction-webpack-pluginThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in my previous comment
#34969 (review), the proposed implementation works. Let's land it to remove the friction and explore improvements separately. @anomiex, thank you for providing the fix.
Description
Webpack 5 recommends using the content hash over the old-style chunk
hash for cache invalidation, as the chunk hash can change even if the
output does not.
dependency-extraction-webpack-plugin will now include the content hash
for each content type in the asset file. The
version
field is stillpresent, now as a combined hash for all the content types reported, to
maintain backwards compatibility.
Fixes #34660
How has this been tested?
Tested locally, both in a trivial repo and in https://github.com/Automattic/jetpack/tree/master/projects/plugins/jetpack.
Screenshots
N/A
Types of changes
New feature.
Checklist:
*.native.js
files for terms that need renaming or removal).