Skip to content
This repository has been archived by the owner on Sep 16, 2020. It is now read-only.

Reconsider the project with standardization of OCI Distribution Spec #21

Open
AkihiroSuda opened this issue Apr 13, 2018 · 16 comments
Open

Comments

@AkihiroSuda
Copy link
Owner

Originally I designed FILEgrain to be agnostic to distribution protocols, because the standardization of distribution was out of the scope of OCI's mission at that time.
But the situation has changed now.

If vnd.oci.image.layer.v1.tar (not tar+gzip) blobs are pushed to a Docker/OCI registry (via Transfer-Encoding: gzip), and the registry supports HTTP Range Requests, it is not difficult to implement deduplication and lazy-pulling in arbitrary granularity, without introducing FILEgrain.

Apparently, no change is needed on the distribution spec.

However, as an optimization, we might be able to define some extension spec for fetching tar headers at once.

e.g.

HEAD /v2/foo/blobs/sha256:deadbeef

200 OK
Content-Type: application/vnd.oci.image.layer.v1.tar
LazyPull-Digest: sha256:cafebabe
GET /v2/foo/blobs/sha256:cafebabe

200 OK
Content-Type: application/vnd.lazypull.manifest

{
  {
    // can be continuity manifest, but using raw TarHdr bytes
    // might be beneficial for deduplication on registry-side.
    bytes TarHdr = 1;
    // clients can invoke HTTP Range Requests with this offset for lazy-pulling.
    int64 payloadOffsetInOriginalTar = 2;
  },
  {
    ...
  },
  ...
}

cc @stevvooe @dmcgowan @tonistiigi

@AkihiroSuda
Copy link
Owner Author

Memo: we could also put tar payload digest to application/vnd.lazypull.manifest entries so as to avoid full-pulling as in rkt+casync https://fosdem.org/2018/schedule/event/containers_casync/

@cyphar
Copy link

cyphar commented May 29, 2018

Unfortunately my current view is that we should entirely ditch tar and come up with a better format. I'm currently working on a blog post to describe what it would look like.

@AkihiroSuda
Copy link
Owner Author

+1, I'm also looking into whether we can use git objects as in https://github.com/bup/bup .
It could also potentially support direct-pulling from github/gitlab repo.

@dmcgowan
Copy link

dmcgowan commented May 30, 2018

I am also in favor of finding an alternative to tar. The only issue I have with git is the use of sha-1 requires having additional hashing on top of it, complicating any distribution mechanism. I still envision being able to use the distribution protocol for distributing metadata files and content archives, I would love to discuss with both of you what you envision. Continuity obviously represents some of the initial thoughts around representing the file metadata.

@cyphar
Copy link

cyphar commented May 30, 2018

continuity doesn't have the deduplication that would be incredibly useful for massively reducing the network and storage overhead -- it has a somewhat similar model to tar except it separates the data from the metadata with a digest. This is an improvement but has problems that I highlighted in my talk (and will explain in further detail in my blog post). git has somewhat similar issues because everything is file-based. (My evaluation of continuity is based on looking at the protobuf definition and seeing how it is structured.)

I think if you're looking for a good example of what I'd like to have, take a look at how restic structures data. They use content-defined chunking to allow for chunk-level deduplication (which would be such a huge network bandwidth improvement it's insane) as well as having an index for the entire repository. We would need to have an index for each tag (and coming up with a good index format would be the most annoying part).

@AkihiroSuda
Copy link
Owner Author

git has somewhat similar issues because everything is file-based.

packfile is not and deduplicatable
https://git-scm.com/book/en/v2/Git-Internals-Packfiles

@cyphar
Copy link

cyphar commented May 30, 2018

I'm not sure about that -- packfile does store deltas of similar files (which is what that page says) but that is very different to content-defined chunking (you aren't storing deltas -- you're chunking up files and then storing the chunks) -- at least that's from what I've read. However bup in particular does actually do rolling checksums (which is basically how content-defined chunking works). So the packfile and index format could be useful -- though we would still need to have the history index (we could use the git one though).

I personally think that having a CoW tree with some sort of ZFS-style birthtime would be the best way of implementing it (though restic just uses simple trees and a GC pass).

@AkihiroSuda
Copy link
Owner Author

@cyphar any update about your blog? ^^

@cyphar
Copy link

cyphar commented Sep 3, 2018

We discussed this in-person, and I'm going to give a talk about it on Friday. My blog post is still a WIP -- I will try to finish it this week but it looks like I might have to rewrite it significantly given that I've had a few months to think about the problem some more.

@bergwolf
Copy link

@cyphar @AkihiroSuda Any updates on the topic?

@AkihiroSuda
Copy link
Owner Author

cc @ktock

@ktock
Copy link

ktock commented Jun 24, 2019

Currently, I'm working on the lazy-pulling filesystem(FUSE) based on this approach internally.
First I've implemented it without "the manifest of tar headers" experimentally and it works but I think we need to make more improvements on the performance of "mount" and "read".
I think the manifest is good to improve "mount" performance.

I want to make it open when it is done.

@ktock
Copy link

ktock commented Sep 25, 2019

FYI: There are similar kind of discussion recently, which is aiming to achieve lazy-pull with preserving compatibility.

@cyphar
Copy link

cyphar commented Sep 26, 2019

An updated description of the design and implementation was given last weekend at All Systems Go.

@AkihiroSuda
Copy link
Owner Author

CRFS now works with containerd! (still requires patches)

containerd/containerd#3731
https://github.com/ktock/remote-snapshotter

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants