-
Notifications
You must be signed in to change notification settings - Fork 24
Review Filestore for 300TB Challenge. Update Stories + Specs #85
Comments
The p.r. to review is ipfs/kubo#3368. |
Relevant notes from Sprint Planning Call:Conclusions from review: @jbenet & @whyrusleeping need to sit down (probably today) and figure out how they want to proceed with this. @flyingzumwalt will try to capture that info in the filestore Stories & Epics Main things that need to be specified:
|
@jbenet @whyrusleeping I'm leaving this open and "in progress" until we've updated the filestore Specs and Stories |
I don't think my filestore fully understood and would like to be involved in these discussions. I would like to know where the mixing is. I did spend several months on this after all. I also have never talked in person with @jbenet on this.
…On January 17, 2017 3:25:40 PM EST, Matt Zumwalt ***@***.***> wrote:
## Relevant notes from Sprint Planning Call:
Conclusions from review:
The current implementation mixes porcelain UX concerns with the
underlying iplementation/plumbing. This makes the interfaces confusing
& complicated. It also makes the underlying plumbing more complicated
and less robust than it should be.
Best approach: take the pieces of the code that we need and package it
as an *experimental* feature with simple, straightforward interfaces.
@jbenet & @whyrusleeping need to sit down (probably today) and figure
out how they want to proceed with this. @flyingzumwalt will try to
capture that info in the [filestore Stories &
Epics](https://github.com/ipfs/archives/issues?q=is%3Aopen+label%3Afilestore)
Main things that need to be specified:
* How to do the internals/plumbing
* What the UX should look like
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#85 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
@kevina context: ipfs/team-mgmt#309 (comment) |
Reviewing Filestore for data.gov SprintAgenda:
Background:
NotesWorried about getting the UX Wrong. Want to be careful to get it right. Goals for this sprint
... so it's important to distinguish between porcelain and plumbing Bad things that must never happen:
Examples: Dropbox & bittorrent the bittorrent one is basically just bittorrent, but the UX is aimed at people who don't want to make torrent files and run torrent trackers, etc Dropbox is set up around UX of "track this directory"
Things to consider
Existing flatfs code baseLots of lessons learned from @kevina's efforts. We can probably re-use the protobuf code and some of the commands. ConstraintsGiven we have a repo and are going to add objects to the repo that reference stuff elsewhere on the filesystem at specific memory locations with specific size What happens when
Possibility: "Filestore" on both sides -- aka: ipfs get-as-filestore. pull down the blocks, "export" them to location on filesystem, and delete the block from the ipfs repo
Interesting Use case: I've got a laptop and lots of hard drives. I want to help back up data.gov. I attach a big drive, pull down data, routing it onto the attached drive, and then detatch the drive. I frequently close my laptop (ie. to put it in my backpack) Occasionally I reconnect the drive to share the data. Check-in: Why are we doing this work?Why are we doing this work?
ConclusionsShort-term Assumption: filestore is for files that won't be changed casuallyFor this current sprint, we're dealing with data that will not be changed/moved. The files & directories that Jack registers using filestore will stay in-place. Renaming or moving those files would be a notable event where we can expect Jack to explicitly rebuild the ipfs-pack manifest (and therefore update the filestore lookup tables) This allows us to set aside (for now) use cases where users will casually and frequently change or rename files. For those use cases, at least for now, we encourage people to use Fuse to mount IPFS content (which might have been added using filestore), and update/rename the content there. How filestore will workWhen you want "filestore" behavior, you generate an ipfs-pack with its own ipfs repo configured to locate its blocks using "relative paths" to the content you're referencing. The structure of a pack is similar to git repositories -- the .ipfs directory sits in the root of the pack, alongside the other files/directories at the root of the pack. By default the relative path can only reference content inside of the pack -- just as git doesn't let you add content outside the repo. "descendants only" can be turned off but strongly discourage it because moving the ipfs-pack would break all the paths, etc... Export to a packExporting IPFS content to an ipfs-pack is an inevitable use case. ipget could be the tool (or starting point for tool) to do this -- give it an ipfs hash and it will build packs from the existing ipfs content/blocks StoriesRegister a filesystem Directory in IPFS without Duplicating Contentrecorded as #92 Given: Then: How the Internals Should WorkWhen I tell IPFS to register a directory, it first turns the directory into an IPFS Pack. To do this, it
It then registers the pack with your local ipfs node(s). To do this it
The structure of a pack is similar to git repositories -- the .ipfs directory sits in the root of the pack, alongside the other files/directories at the root of the pack. By default the relative path can only reference content inside of the pack -- just as git doesn't let you add content outside the repo. Serve Content Directly From a Pack to IPFS Network (no intermediary node)recorded as #108 Given Then Serve Content from Local Packs Through a Regular IPFS noderecorded as #109 Given Then Update, Rename or Delete Contents of an IPFS Packrecorded as #93 Given Then Generate a New IPFS Pack from Existing IPFS Contentrecorded as #90 Given Then Note: ipget could be the tool (or starting point for a tool) to do this -- give it an ipfs hash and it will build packs from the existing ipfs content/blocks. Selectively Add Files and Directories to the Packrecorded as #127 Given I only want to add some of the files and/or subdirectories to the IPFS pack. Then I follow these steps:
Selectively Ignore Files and Directoriesrecorded as #128 Given Then Use a Single Pack to Track Many Files Across an OSrecorded as #129 Given:
Option 1: put the ipfs-pack at the root of your filesystem and selectively add files (see Story: Selectively Add Files) Option 2: Disable "internal paths only" mode on filestore. By default, filestore uses only "internal paths", meaning that it only allows you to reference files that are inside the IPFS pack's root directory. This is similar to git, whose repositories only allow you to add content that is inside the git repository's working directory. "internal paths only" mode can be turned off but we strongly discourage it because moving the ipfs-pack would break all the paths, etc... |
@flyingzumwalt did you mean to keep this issue closed? |
Sorry @kevina. That was a mistake. |
Overall I really like this idea. Depending of what @jbenet thinks of my basic filestore implementation (ipfs/kubo#3368) and what he had in mine for the layout of ipfs-pack (if anything) I might be able to implement this fairly quickly (like over the weekend). My proposal will be to just make the filestore level DB be version 0 of the ipfs pack with some additional metadata to determine the root hash which will be a unixfs directory node. The index will be an an unixfs directory node that can be stored in the filestore DB directly (the protobuf format used by the filestore can already stores non-leaf nodes). I can understand, however if we want a simpler format (perhaps text based) that does not depend on implementation details. I do not like the idea of non-internal paths. Rather I propose that if a user wants that we require absolute paths, far less brittle that way. For the purposes of this sprint I propose we just don't allow them. In order to use multiple packs at the same time we are going to require some sort of multi-datastore. If there are too many packs attached to an ipfs node then there could be performance problems as, with out some sort of indexing, each pack will need to be searched in turn. A bloom filter for each pack can help, but not eliminate the problem. Pinning and in particular the Garbage Collector is going to be a problem. I am going to assume that an ipfs pack is immutable therefore blocks should never be deleted (and it may even be impossible to delete in the ipfs pack in on an read only filesystem) by the GC. One way to solve this would be to use a recursive pin on the root unixfs directory node. However, this will cause the garbage collector to do a huge amount of unnecessary work and use up a lot of unnecessary memory. Basically it will first walk the root node and load the hashes of all the blocks in the pack into memory. It will then iterate over all the hashes of the block in the pack only to discover that none of the hashes in the pack can be collected. A better way would be to have the GC ignore what in a pack altogether. I tried to propose a solution to the multi-datastore problem in ipfs/kubo#3119 and ipfs/kubo#3257. Due to lack of time by the core team member, the proposal in #3119 never received any serious consideration. When the implementation was reviewed by @whyrusleeping he (mostly in a private IRC conversation) rejected my idea of considering all filestore objects implicitly pinned due to the complexities it added to the interface. We might be able to punt on the GC and pinning issues for this sprint, but I hope that we can revisit the issue sometime soon. |
The code has been reviewed. Work is proceeding with issues like #92 |
Review filestore -- declare what needs to be done to land it, at least for the 300TB Challenge, and call out any implications for ipfs-pack
Concise Filestore "spec" https://gist.github.com/whyrusleeping/5565651011b49ddc9ddeec9ffd169050
The text was updated successfully, but these errors were encountered: