-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dependency-extraction-webpack-plugin: Calculate vendor hash from file output rather than Webpack internal state #34969
Changes from all commits
b9d4f72
f99139f
3e90392
c3e54b4
a6976e5
0ce7480
9ece923
324dfe8
3fa379f
d8f4c1a
f4a9794
6c6389f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -128,7 +128,7 @@ class DependencyExtractionWebpackPlugin { | |||||||||||||||||||||||||||||||||||
name: this.constructor.name, | ||||||||||||||||||||||||||||||||||||
stage: | ||||||||||||||||||||||||||||||||||||
compiler.webpack.Compilation | ||||||||||||||||||||||||||||||||||||
.PROCESS_ASSETS_STAGE_ADDITIONAL, | ||||||||||||||||||||||||||||||||||||
.PROCESS_ASSETS_STAGE_ANALYSE, | ||||||||||||||||||||||||||||||||||||
}, | ||||||||||||||||||||||||||||||||||||
() => this.addAssets( compilation, compiler ) | ||||||||||||||||||||||||||||||||||||
); | ||||||||||||||||||||||||||||||||||||
|
@@ -198,14 +198,25 @@ class DependencyExtractionWebpackPlugin { | |||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
// Go through the assets and hash the sources. We can't just use | ||||||||||||||||||||||||||||||||||||
// `entrypointChunk.contentHash` because that's not updated when | ||||||||||||||||||||||||||||||||||||
// assets are minified. Sigh. | ||||||||||||||||||||||||||||||||||||
// @todo Use `asset.info.contenthash` if we can make sure it's reliably set. | ||||||||||||||||||||||||||||||||||||
const hash = createHash( 'sha512' ); | ||||||||||||||||||||||||||||||||||||
for ( const filename of entrypoint.getFiles().sort() ) { | ||||||||||||||||||||||||||||||||||||
const asset = compilation.getAsset( filename ); | ||||||||||||||||||||||||||||||||||||
anomiex marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||
hash.update( `${ filename }: ` ); | ||||||||||||||||||||||||||||||||||||
gziolo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||
asset.source.updateHash( hash ); | ||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
const entrypointChunk = isWebpack4 | ||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, we probably could get files from the const entrypointChunk = isWebpack4
? entrypoint.chunks.find( ( c ) => c.name === entrypointName )
: entrypoint.getEntrypointChunk();
const entrypointChunkHash = createHash( 'sha512' );
for ( const filename of Array.from( entrypointChunk.files ).sort() ) {
entrypointChunkHash.update( filename + ':' + compilation.getAsset( filename ).source.source() );
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In gutenberg/packages/scripts/config/webpack.config.js Lines 107 to 123 in 67fcf07
With the current implementation it processes 3 files:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you still suggesting to change the implementation here as in your second comment, or is the third comment convincing yourself not to? It seems to me that your last comment is the argument not to make the change you suggested in your second comment. The extra file is part of the asset and so should be covered by the hash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My last comment provides the reasoning why the change included in #34969 (comment) would be a good improvement. We don't care about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you care if I point out that someone using If you really want this change I'm not going to fight it since our use case has neither of these sorts of extra files. But I do think if we're going to have one hash for the asset, it should cover the whole asset rather than excluding parts of it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's also the possibility that someone is using splitChunks like you do there but on JS code chunks, or with Webpack's automatic vendor splitting. |
||||||||||||||||||||||||||||||||||||
? entrypoint.chunks.find( ( c ) => c.name === entrypointName ) | ||||||||||||||||||||||||||||||||||||
: entrypoint.getEntrypointChunk(); | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
const assetData = { | ||||||||||||||||||||||||||||||||||||
// Get a sorted array so we can produce a stable, stringified representation. | ||||||||||||||||||||||||||||||||||||
dependencies: Array.from( entrypointExternalizedWpDeps ).sort(), | ||||||||||||||||||||||||||||||||||||
version: entrypointChunk.hash, | ||||||||||||||||||||||||||||||||||||
version: hash.digest( 'hex' ).substring( 0, 32 ), | ||||||||||||||||||||||||||||||||||||
}; | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
const assetString = this.stringify( assetData ); | ||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that it could get fixed on the webpack side? Is there an issue open that describes the same use case and could be included as a reference so this todo item can be revisited later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's not what it means.
What this comment is referring to is that sometimes Webpack sets
.info.contenthash
on theasset
object, but it only does so if it decides that it needs it (if I recall correctly, it depends on whether the filename template uses[contenthash]
). If we could ensure that Webpack sets that every time, we could just use it instead of hashing the file contents ourself.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think the algorithm you proposed is a nice improvement so let's rephrase the comment so it presents the benefits, instead of the limitations of the webpack. I can fully confirm that it's going to solve the issue that Jetpack struggles with as explained by @kraftbj with hashes changing depending on the absolute path for files. We can also link this PR so people can learn the full context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection to changing the comment, but I don't know what to rephrase it to. Care to make a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, ideally we would just change (line 208):
to:
and be done? But we can't do that because webpack doesn't always set it? And/or the value doesn't have the properties that we need?
I noticed that webpack has an
optimization.realContentHash
config option. Which means that there are multiple valid ways how to computecontenthash
, and that the default ways is in some sense "not real."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have the goal to use the same hash as the real
[contenthash]
, and we are only forced to calculate it ourselves, they should match. I.e., if my webpack config uses a[name].[contenthash].js
filename template, and also uses the extraction plugin, the[contenthash]
andversion
should be the same. But they currently aren't. The differences I see are:md4
algorithm by default, configurable withoutput.hashFunction
option, while the plugin usessha512
.It's weird that even making these two changes, the hashes are still different for me.
It would be worthwhile to spend some timeboxed time trying to fix this before merging. Other than that, I think this patch is ready to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
md4
no longer works with Node 17+.Replicating the same algorithm as webpack uses internally is rather complex. This is the initial commit when the support for
contentHash
was added:webpack/webpack@b929d4c.
Maybe we should try to detect
contentHash
first, and fall back to the custom handler otherwise. ExamplecontentHash
objects:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
webpack ships its own WASM module that implements
md4
. Instead of importingcreateHash
from Node'scrypto
module, we can use the webpack version:The
RealContentHashPlugin
source is indeed very complex, and I'm not sure what set of possible use cases it covers, but I've been able to replicate thecontenthash
quite easily:Here the non-obvious step is to not use
asset.source.updateHash
, because that hashes a string"RawSource" + source
, instead of justsource
. That's all the difference and when computing a content hash, we only want the source.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it another way, we don't actually care what exactly goes into the hash as long as changes to the output result in changes to the hash, and non-changes to the output do not result in changes to the hash. Whether or not
"RawSource"
is incorporated in the hash makes no difference from that perspective.OTOH, using
asset.source.updateHash
may have a performance advantage if the source is something like ReplaceSource or ConcatSource that does non-trivial work insource()
orbuffer()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that my suggestions make an actual observable improvements to this behavior. I stopped hashing the file name, so renaming the file won't change the hash. In the end, the loading URL is going to be like
script.js?ver=hash
, where name change doesn't require a hash change to load the right asset.Also, incorporating internal structure like
RawSource
tags changes the hash when webpack internals change, even if the content remains the same. It seems to me that.updateHash
is more fit for internal purposes, not for calculatingcontenthash
.The performance advantage, I'm afraid it will never materialize. At some point webpack will write the asset to a file, and will call
source.buffer()
to construct the buffer to write. When calculating contenthash we'll just just construct the buffer a bit earlier, at a very late compilation stage where the source is unlikely to change further.