Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: "children" pinning mode #5133

Open
hsanjuan opened this issue Jun 18, 2018 · 8 comments
Open

Feature: "children" pinning mode #5133

hsanjuan opened this issue Jun 18, 2018 · 8 comments

Comments

@hsanjuan
Copy link
Contributor

Version information: master

Type: Feature

Description:

I have discussed this with someone but apparently it was not written down. IPFS Cluster needs IPFS to pin partial trees, that is, a root CID and its immediate children (this is part of the sharding feature). Thus, I'll be working to add this feature to go-ipfs. I will reference this issue in upcoming PRs.

@hsanjuan hsanjuan self-assigned this Jun 18, 2018
hsanjuan added a commit that referenced this issue Jun 18, 2018
This comes in the context of #5133. It enables Merkledag to fetch
DAGs down to a given depth.

Note that actual usage of depth is expected to be 1, or 2 (and not
an arbitrarily high value), thus I have opted to not complicate things with
branch-pruning optimizations. They can be introduced if they are ever needed
at another point in time.
hsanjuan added a commit that referenced this issue Jun 18, 2018
This comes in the context of #5133. It enables Merkledag to fetch
DAGs down to a given depth.

Note that actual usage of depth is expected to be 1, or 2 (and not
an arbitrarily high value), thus I have opted to not complicate things with
branch-pruning optimizations. They can be introduced if they are ever needed
at another point in time.

License: MIT
Signed-off-by: Hector Sanjuan <[email protected]>
hsanjuan added a commit that referenced this issue Jun 18, 2018
This comes in the context of #5133. It enables Merkledag to fetch
DAGs down to a given depth.

Note that actual usage of depth is expected to be 1, or 2 (and not
an arbitrarily high value), thus I have opted to not complicate things with
branch-pruning optimizations. They can be introduced if they are ever needed
at another point in time.

License: MIT
Signed-off-by: Hector Sanjuan <[email protected]>
hsanjuan added a commit that referenced this issue Jun 18, 2018
This comes in the context of #5133. It enables Merkledag to fetch
DAGs down to a given depth.

Note that actual usage of depth is expected to be 1, or 2 (and not
an arbitrarily high value), thus I have opted to not complicate things with
branch-pruning optimizations. They can be introduced if they are ever needed
at another point in time.

License: MIT
Signed-off-by: Hector Sanjuan <[email protected]>
hsanjuan added a commit that referenced this issue Jun 20, 2018
This implements #5133 introducing an option to limit how deep we fetch and store
the DAG associated to a recursive pin ("--max-depth"). This feature
comes motivated by the need to fetch and pin partial DAGs in order to do
DAG sharding with IPFS Cluster.

This means that, when pinning something to --max-depth, the DAG will be
fetched only to that depth and not more.

In order to get this, the PR introduces new recursive pin types: "recursive1"
means: the given CID is pinned along with its direct children (maxDepth=1)

"recursive2" means: the given CID is pinned along with its direct children
and its grandchildren.

And so on...

This required introducing "maxDepth" limits to all the functions walking down
DAGs (in merkledag, pin, core/commands, core/coreapi, exchange/reprovide modules).

maxDepth == -1 effectively acts as no-limit, and all these functions behave like
they did before.

In order to facilitate the task, a new CID Set type has been added:
thirdparty/recpinset. This set carries the MaxDepth associated to every Cid.
This allows to shortcut exploring already explored branches just like the original
cid.Set does. It also allows to store the Recursive pinset (and replaces cid.Set).
recpinset should be moved outside to a different repo eventually.

TODO: tests
TODO: refs -r with --max-depth

License: MIT
Signed-off-by: Hector Sanjuan <[email protected]>
hsanjuan added a commit that referenced this issue Jun 20, 2018
This implements #5133 introducing an option to limit how deep we fetch and store
the DAG associated to a recursive pin ("--max-depth"). This feature
comes motivated by the need to fetch and pin partial DAGs in order to do
DAG sharding with IPFS Cluster.

This means that, when pinning something to --max-depth, the DAG will be
fetched only to that depth and not more.

In order to get this, the PR introduces new recursive pin types: "recursive1"
means: the given CID is pinned along with its direct children (maxDepth=1)

"recursive2" means: the given CID is pinned along with its direct children
and its grandchildren.

And so on...

This required introducing "maxDepth" limits to all the functions walking down
DAGs (in merkledag, pin, core/commands, core/coreapi, exchange/reprovide modules).

maxDepth == -1 effectively acts as no-limit, and all these functions behave like
they did before.

In order to facilitate the task, a new CID Set type has been added:
thirdparty/recpinset. This set carries the MaxDepth associated to every Cid.
This allows to shortcut exploring already explored branches just like the original
cid.Set does. It also allows to store the Recursive pinset (and replaces cid.Set).
recpinset should be moved outside to a different repo eventually.

TODO: tests
TODO: refs -r with --max-depth

License: MIT
Signed-off-by: Hector Sanjuan <[email protected]>
hsanjuan added a commit that referenced this issue Jun 21, 2018
This implements #5133 introducing an option to limit how deep we fetch and store
the DAG associated to a recursive pin ("--max-depth"). This feature
comes motivated by the need to fetch and pin partial DAGs in order to do
DAG sharding with IPFS Cluster.

This means that, when pinning something to --max-depth, the DAG will be
fetched only to that depth and not more.

In order to get this, the PR introduces new recursive pin types: "recursive1"
means: the given CID is pinned along with its direct children (maxDepth=1)

"recursive2" means: the given CID is pinned along with its direct children
and its grandchildren.

And so on...

This required introducing "maxDepth" limits to all the functions walking down
DAGs (in merkledag, pin, core/commands, core/coreapi, exchange/reprovide modules).

maxDepth == -1 effectively acts as no-limit, and all these functions behave like
they did before.

In order to facilitate the task, a new CID Set type has been added:
thirdparty/recpinset. This set carries the MaxDepth associated to every Cid.
This allows to shortcut exploring already explored branches just like the original
cid.Set does. It also allows to store the Recursive pinset (and replaces cid.Set).
recpinset should be moved outside to a different repo eventually.

TODO: tests
TODO: refs -r with --max-depth

License: MIT
Signed-off-by: Hector Sanjuan <[email protected]>
@kevina
Copy link
Contributor

kevina commented Jul 24, 2018

Moved from #5142:

Rather than complicate our pinner one potential solution is to enhance our notation of best-effort pins and fetch the needed subtrees separately.

@Stebalien
Copy link
Member

So, I would like some kind of "download manager" to track download progress in MFS/best-effort pins. The main issue here would be accidentally downloading nodes we don't want to keep but keeping them anyways. @kevina what did you have in mind?


Also, talking this over with @whyrusleeping, I'd like to do the following:

  1. Switch to the refmt based go-ipld-cbor
  2. Finalize go-ipld-hamt (requires 1).
  3. Store the pinset in go-ipld-hamt.

This would allow us to easily add extra metadata to pins and git rid of 90% of our current logic.


What can be done now:

  • Depth-limited traversal
  • Depth-limited ipfs refs -r

@hsanjuan
Copy link
Contributor Author

I'll take on depth limited traversal and refs -r.

@0zAND1z
Copy link

0zAND1z commented Mar 11, 2019

Hi all,

I have been redirected to this issue, cited as a blocking item, to enable the DAG Sharding support in the ipfs-cluster.

Hoping that the information is still relevant, I would like to seek some clarity on the following questions:

Question 1: Is the purpose of depth-limit to enable the user to modulate the number of shards to be generated for a given file?

Consider a scenario where a file bearing a multihash X is to be pinned across the cluster be divided into an S number of shards(say x1, x2, x3 ... xS). Does the depth-limit support the creation of these number of shards? Is it necessary that all the S number of shards be persisted in minimum one node, or could these unique shards be distributed across the cluster?

Question 2: Are the shards subjected to distribution and overlapping for fault tolerance?

Consider the same file bearing a multihash X is to be pinned across the cluster be divided into an S number of shards(say x1, x2, x3 ... xS) in an overlapping manner across the M number of nodes in the cluster? Is this planned & made possible under the scope of the current issue?

All help in answering these questions are appreciated. Thanks.

@hsanjuan
Copy link
Contributor Author

hi,

while this issue is a blocker to enable sharding in cluster, these are Cluster questions so they would better be answered in discuss.ipfs.io or the cluster repository, as they are not specific to the issue here.

Question 1: Is the purpose of depth-limit to enable the user to modulate the number of shards to be generated for a given file?

Consider a scenario where a file bearing a multihash X is to be pinned across the cluster be divided into an S number of shards(say x1, x2, x3 ... xS). Does the depth-limit support the creation of these number of shards?

As sharding is approached, it builds a parallel DAG referencing the original DAG nodes but with a different Layout. In this approach, we need to pin the shard CID (x1, x2 ...) and their children, which are the DAG nodes from the original object.

Is it necessary that all the S number of shards be persisted in minimum one node, or could these unique shards be distributed across the cluster?

Each shard needs to be fully pin in at least one node. But shard size can be configured.

Question 2: Are the shards subjected to distribution and overlapping for fault tolerance?

Consider the same file bearing a multihash X is to be pinned across the cluster be divided into an S number of shards(say x1, x2, x3 ... xS) in an overlapping manner across the M number of nodes in the cluster? Is this planned & made possible under the scope of the current issue?

you could pin each shard in several ipfs nodes. The number of shards depends on the size chosen for them, and then each shard works like an independent pin item.

Please follow up in discuss.ipfs.io if you have more questions about sharding.

@kevincox
Copy link

+1 to this feature. The current IPFS pining solution is unworkable for many use cases and this solves the biggest two issues I am running up against. The TL;DR is that instead of pinning things directly you can pin the children of a directory. This directory can then be remembered and have metadata put into it.

Note that this was already the case for recursive pins, but if you can't use recursive pins in your use case then you are currently stuck. I hope I can help emphasize how important this is for automation around IPFS.

Impossible to un-pin

Right now it is "impossible" to unpin anything in go-ipfs because there is no "name" for a pin So you don't know if you are the only person to pin something. This is most obviously a problem in a multi-user scenario. Imagine that UserA pins HashX and then UserB pins it. Now UserA is done with HashX and removes the pin. Oops! UserB also lost that content.

This is a re-occurrence of the classic POSIX Advisory Locking Problem.

Now you can create a "directory" of everything you would like to pin and pin at depth=1 and you have now shallowly-pinned everything inside of this folder without the sharing issue. (If you are paranoid of someone else pinning the exact set of things as you you can always add some random data to be sure).

Lack of Metadata for Pins

Since you can pin a directory you can now put as much metadata as you like in the directory or in the file names.

You still can't collect metadata about all pins but for the reason mentioned above scanning and removing pins you don't know about is probably a bad idea without coordination anyways.

Also discussed in #4586

@hsanjuan
Copy link
Contributor Author

hsanjuan commented Apr 1, 2020

Right now it is "impossible" to unpin anything in go-ipfs because there is no "name" for a pin So you don't know if you are the only person to pin something. This is most obviously a problem in a multi-user scenario.

This is a UX feature only. Not saying IPFS should not provide pin labels, but things are addressed by CID and that is that is the actual pin name. You can check if other people are providing content already (ipfs dht findprovs).

Imagine that UserA pins HashX and then UserB pins it. Now UserA is done with HashX and removes the pin. Oops! UserB also lost that content.

If UserB pins it, then user B will not lose that content, since it's pinned in their ipfs node.


This issue is about a limited-recursive pinning mode though, what you seem to want to do is to be able to inject a number of direct pins.

@kevincox
Copy link

kevincox commented Apr 1, 2020

If UserB pins it, then user B will not lose that content, since it's pinned in their ipfs node.

I'm talking about on a single node. IIUC the IPFS network has no concept of pins.

Note that instead of UserA and UserB it could be AppA and AppB. I'm not planning on running a separate node for every app I use that touches IPFS pins 😅

This issue is about a limited-recursive pinning mode though, what you seem to want to do is to be able to inject a number of direct pins.

Exactly. Injecting a node pinned to depth=1 is equivalent to injecting a direct pin for every item linked to by that node. The issue solves both of these problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants