-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compression on blob export should be an array #2347
Comments
While working on #2350 some more thoughts on this and #2346 The current ref/remote design was very much based on idea that there is only 1 blob per ref that obviously is not true anymore, with different compressions and different blobs extracting to same diffid. This leads to bugs, eg. remote cache issue I discovered in #2350 and hacky code with lots of exceptions. As a first step, to unblock the cache issue I think
So changes from current #2350 are that This enables getting all the blobs (chains) that match any of the compressions. Cache can then export all available blobs or find the correct one that matches inline cache. If Phase 2It would be better to get rid of the current label-based linking between blobs. Explained some of the current problems in #2346 . One way would be that every ref creates an additional "blob lease" where the blob is put. There is no "main blob", or I think we can also get rid of the |
overall SGTM. Some minor questions:
|
Both can return mixed. Mixed can be avoided if compression is single length array and Question: how do multiple instances of remotes work with parent chains. It tries to find a parent with the same compression and if it can't then takes next based on priority order?
Not sure. How are multiple
Actually, not that sure about it anymore. Eg. if an image is pulled and layers put on top of it, even if another blob for same diffid existed we should make sure that blob that was pulled gets uploaded, not another blob that extracts to the same diffid. At the same time, diff should always reuse and reference count existing blobs if diffid matches. |
SGTM.
DescHandler is keyed by the digest of the blob to be unlazied (https://github.com/moby/buildkit/blob/master/cache/blobs.go#L339) so if we want to get rid of "main blob" I think we need some metadata somewhere that indicates which blob need to be pulled from the registry. @sipsma WDYT? |
The idea of using leases to track blobs instead of labels SGTM. In terms of DescHandler, my thoughts are:
So, given the above, I can't see that it's strictly necessary to make any changes to DescHandlers; we can leave to be keyed by the digest of a single compression variant blob. Let me know if I'm missing something. The only reason I could see for making a change is to add a new optimization where we track the providers for each compression variant as part of the cacheRecord, which would then allow us to sometimes pull down a specific variant directly rather than pull down whatever variant the ref has a provider for and do a local conversion. I don't know if this optimization is worth it given it's a fairly obscure situation and it would probably complicate the code a significant amount even further. |
Actually, I just realized that if we are making these updates to |
There are currently 2 fields to control compression of exported blobs:
compression
andforce-compression=bool
.The first type should be an array instead.
compression
field checks that the existing compression exists in the array and if it does not then creates a new blob with the compression of the first element in the array. Ifforce-compression
isfalse
then finalcompression
array value is equal tocompression,<any>
.The new default parameters for images would be:
compression=gzip;uncompressed,force-compression=true
. Notice there is no zstd as atm we don't want to push zstd unless the user knows what they are doing due to limited support. If the user setscompression
field then the defaultforce-compression
is false like atm.This makes it possible that a remote cache backend can start to use
compression=zstd,gzip;uncompressed,force-compression=true
as default and get faster compression for the cache. Should the same blob be exported later to the image then it gets converted to gzip. We might also want to change the remote cache to allow exporting blobs with multiple compressions at the same time.Are there are better ways to achieve some use cases to start using zstd while not breaking everyone else? Notice
-o compression=gzip,force-compression=false
would still get zstd blobs if they already exist, but there is little reason to use it instead of the default if you don't want this behavior.@ktock
The text was updated successfully, but these errors were encountered: