Feature: "children" pinning mode #5133

hsanjuan · 2018-06-18T14:04:37Z

Version information: master

Type: Feature

Description:

I have discussed this with someone but apparently it was not written down. IPFS Cluster needs IPFS to pin partial trees, that is, a root CID and its immediate children (this is part of the sharding feature). Thus, I'll be working to add this feature to go-ipfs. I will reference this issue in upcoming PRs.

This comes in the context of #5133. It enables Merkledag to fetch DAGs down to a given depth. Note that actual usage of depth is expected to be 1, or 2 (and not an arbitrarily high value), thus I have opted to not complicate things with branch-pruning optimizations. They can be introduced if they are ever needed at another point in time.

This comes in the context of #5133. It enables Merkledag to fetch DAGs down to a given depth. Note that actual usage of depth is expected to be 1, or 2 (and not an arbitrarily high value), thus I have opted to not complicate things with branch-pruning optimizations. They can be introduced if they are ever needed at another point in time. License: MIT Signed-off-by: Hector Sanjuan <[email protected]>

This implements #5133 introducing an option to limit how deep we fetch and store the DAG associated to a recursive pin ("--max-depth"). This feature comes motivated by the need to fetch and pin partial DAGs in order to do DAG sharding with IPFS Cluster. This means that, when pinning something to --max-depth, the DAG will be fetched only to that depth and not more. In order to get this, the PR introduces new recursive pin types: "recursive1" means: the given CID is pinned along with its direct children (maxDepth=1) "recursive2" means: the given CID is pinned along with its direct children and its grandchildren. And so on... This required introducing "maxDepth" limits to all the functions walking down DAGs (in merkledag, pin, core/commands, core/coreapi, exchange/reprovide modules). maxDepth == -1 effectively acts as no-limit, and all these functions behave like they did before. In order to facilitate the task, a new CID Set type has been added: thirdparty/recpinset. This set carries the MaxDepth associated to every Cid. This allows to shortcut exploring already explored branches just like the original cid.Set does. It also allows to store the Recursive pinset (and replaces cid.Set). recpinset should be moved outside to a different repo eventually. TODO: tests TODO: refs -r with --max-depth License: MIT Signed-off-by: Hector Sanjuan <[email protected]>

kevina · 2018-07-24T20:54:04Z

Moved from #5142:

Rather than complicate our pinner one potential solution is to enhance our notation of best-effort pins and fetch the needed subtrees separately.

Stebalien · 2018-07-24T22:22:11Z

So, I would like some kind of "download manager" to track download progress in MFS/best-effort pins. The main issue here would be accidentally downloading nodes we don't want to keep but keeping them anyways. @kevina what did you have in mind?

Also, talking this over with @whyrusleeping, I'd like to do the following:

Switch to the refmt based go-ipld-cbor
Finalize go-ipld-hamt (requires 1).
Store the pinset in go-ipld-hamt.

This would allow us to easily add extra metadata to pins and git rid of 90% of our current logic.

What can be done now:

Depth-limited traversal
Depth-limited ipfs refs -r

hsanjuan · 2018-07-25T11:51:22Z

I'll take on depth limited traversal and refs -r.

0zAND1z · 2019-03-11T06:20:53Z

Hi all,

I have been redirected to this issue, cited as a blocking item, to enable the DAG Sharding support in the ipfs-cluster.

Hoping that the information is still relevant, I would like to seek some clarity on the following questions:

Question 1: Is the purpose of depth-limit to enable the user to modulate the number of shards to be generated for a given file?

Consider a scenario where a file bearing a multihash X is to be pinned across the cluster be divided into an S number of shards(say x1, x2, x3 ... xS). Does the depth-limit support the creation of these number of shards? Is it necessary that all the S number of shards be persisted in minimum one node, or could these unique shards be distributed across the cluster?

Question 2: Are the shards subjected to distribution and overlapping for fault tolerance?

Consider the same file bearing a multihash X is to be pinned across the cluster be divided into an S number of shards(say x1, x2, x3 ... xS) in an overlapping manner across the M number of nodes in the cluster? Is this planned & made possible under the scope of the current issue?

All help in answering these questions are appreciated. Thanks.

hsanjuan · 2019-03-12T21:30:58Z

hi,

while this issue is a blocker to enable sharding in cluster, these are Cluster questions so they would better be answered in discuss.ipfs.io or the cluster repository, as they are not specific to the issue here.

Question 1: Is the purpose of depth-limit to enable the user to modulate the number of shards to be generated for a given file?

Consider a scenario where a file bearing a multihash X is to be pinned across the cluster be divided into an S number of shards(say x1, x2, x3 ... xS). Does the depth-limit support the creation of these number of shards?

As sharding is approached, it builds a parallel DAG referencing the original DAG nodes but with a different Layout. In this approach, we need to pin the shard CID (x1, x2 ...) and their children, which are the DAG nodes from the original object.

Is it necessary that all the S number of shards be persisted in minimum one node, or could these unique shards be distributed across the cluster?

Each shard needs to be fully pin in at least one node. But shard size can be configured.

Question 2: Are the shards subjected to distribution and overlapping for fault tolerance?

Consider the same file bearing a multihash X is to be pinned across the cluster be divided into an S number of shards(say x1, x2, x3 ... xS) in an overlapping manner across the M number of nodes in the cluster? Is this planned & made possible under the scope of the current issue?

you could pin each shard in several ipfs nodes. The number of shards depends on the size chosen for them, and then each shard works like an independent pin item.

Please follow up in discuss.ipfs.io if you have more questions about sharding.

kevincox · 2020-03-27T23:02:24Z

+1 to this feature. The current IPFS pining solution is unworkable for many use cases and this solves the biggest two issues I am running up against. The TL;DR is that instead of pinning things directly you can pin the children of a directory. This directory can then be remembered and have metadata put into it.

Note that this was already the case for recursive pins, but if you can't use recursive pins in your use case then you are currently stuck. I hope I can help emphasize how important this is for automation around IPFS.

Impossible to un-pin

Right now it is "impossible" to unpin anything in go-ipfs because there is no "name" for a pin So you don't know if you are the only person to pin something. This is most obviously a problem in a multi-user scenario. Imagine that UserA pins HashX and then UserB pins it. Now UserA is done with HashX and removes the pin. Oops! UserB also lost that content.

This is a re-occurrence of the classic POSIX Advisory Locking Problem.

Now you can create a "directory" of everything you would like to pin and pin at depth=1 and you have now shallowly-pinned everything inside of this folder without the sharing issue. (If you are paranoid of someone else pinning the exact set of things as you you can always add some random data to be sure).

Lack of Metadata for Pins

Since you can pin a directory you can now put as much metadata as you like in the directory or in the file names.

You still can't collect metadata about all pins but for the reason mentioned above scanning and removing pins you don't know about is probably a bad idea without coordination anyways.

Also discussed in #4586

hsanjuan · 2020-04-01T11:15:31Z

Right now it is "impossible" to unpin anything in go-ipfs because there is no "name" for a pin So you don't know if you are the only person to pin something. This is most obviously a problem in a multi-user scenario.

This is a UX feature only. Not saying IPFS should not provide pin labels, but things are addressed by CID and that is that is the actual pin name. You can check if other people are providing content already (ipfs dht findprovs).

Imagine that UserA pins HashX and then UserB pins it. Now UserA is done with HashX and removes the pin. Oops! UserB also lost that content.

If UserB pins it, then user B will not lose that content, since it's pinned in their ipfs node.

This issue is about a limited-recursive pinning mode though, what you seem to want to do is to be able to inject a number of direct pins.

kevincox · 2020-04-01T16:43:49Z

If UserB pins it, then user B will not lose that content, since it's pinned in their ipfs node.

I'm talking about on a single node. IIUC the IPFS network has no concept of pins.

Note that instead of UserA and UserB it could be AppA and AppB. I'm not planning on running a separate node for every app I use that touches IPFS pins 😅

This issue is about a limited-recursive pinning mode though, what you seem to want to do is to be able to inject a number of direct pins.

Exactly. Injecting a node pinned to depth=1 is equivalent to injecting a direct pin for every item linked to by that node. The issue solves both of these problems.

hsanjuan self-assigned this Jun 18, 2018

hsanjuan mentioned this issue Jun 18, 2018

Feat: FetchGraphToDepth() to fetch a graph to a given depth #5134

Closed

hsanjuan mentioned this issue Jun 20, 2018

Feat: pin add --max-depth (arbitrary depth recursive pins) #5142

Open

Stebalien unassigned hsanjuan Apr 22, 2021

lanzafame mentioned this issue Jul 15, 2021

IPNS pinning: ipfs name follow #4435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: "children" pinning mode #5133

Feature: "children" pinning mode #5133

hsanjuan commented Jun 18, 2018

kevina commented Jul 24, 2018 •

edited

Loading

Stebalien commented Jul 24, 2018

hsanjuan commented Jul 25, 2018

0zAND1z commented Mar 11, 2019

hsanjuan commented Mar 12, 2019

Question 1: Is the purpose of depth-limit to enable the user to modulate the number of shards to be generated for a given file?

Question 2: Are the shards subjected to distribution and overlapping for fault tolerance?

kevincox commented Mar 27, 2020

hsanjuan commented Apr 1, 2020

kevincox commented Apr 1, 2020

Feature: "children" pinning mode #5133

Feature: "children" pinning mode #5133

Comments

hsanjuan commented Jun 18, 2018

Version information: master

Type: Feature

Description:

kevina commented Jul 24, 2018 • edited Loading

Stebalien commented Jul 24, 2018

hsanjuan commented Jul 25, 2018

0zAND1z commented Mar 11, 2019

Question 1: Is the purpose of depth-limit to enable the user to modulate the number of shards to be generated for a given file?

Question 2: Are the shards subjected to distribution and overlapping for fault tolerance?

hsanjuan commented Mar 12, 2019

Question 1: Is the purpose of depth-limit to enable the user to modulate the number of shards to be generated for a given file?

Question 2: Are the shards subjected to distribution and overlapping for fault tolerance?

kevincox commented Mar 27, 2020

Impossible to un-pin

Lack of Metadata for Pins

hsanjuan commented Apr 1, 2020

kevincox commented Apr 1, 2020

kevina commented Jul 24, 2018 •

edited

Loading