Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: dag import - don't pin roots by default #9926

Merged
merged 4 commits into from
Jun 14, 2023

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Jun 9, 2023

Fixes: #9765

(I can never get the Kubo tests to run, even the short set, locally, so I'm having to do this in CI, it might result in a series of force pushes, sorry).

@rvagg rvagg requested a review from a team as a code owner June 9, 2023 05:51
@rvagg rvagg changed the title feat!: don't pin roots by default feat!: dag import - don't pin roots by default Jun 9, 2023
@rvagg rvagg force-pushed the rvagg/dag-import-nopindefault branch 2 times, most recently from 70be4fb to a9a5130 Compare June 9, 2023 06:33
@rvagg rvagg requested a review from lidel as a code owner June 9, 2023 06:33
@rvagg rvagg force-pushed the rvagg/dag-import-nopindefault branch from a9a5130 to 2c84005 Compare June 9, 2023 07:26
@rvagg
Copy link
Member Author

rvagg commented Jun 9, 2023

I think this is just down to the existing sharness flakies now.

@hacdias
Copy link
Member

hacdias commented Jun 9, 2023

I'm all up for this, but there are still tests failing which do not seem to be due to flakiness: https://tf-aws-gh-runner.s3.amazonaws.com/linux-x64-4xlarge/ipfs/kubo/5219378103/1/sharness.html#t0054-dag-car-import-export

Screenshot 2023-06-09 at 12 12 27

image

@Jorropo
Copy link
Contributor

Jorropo commented Jun 9, 2023

I'm missing why #9755 should be fixed in the first place.

AFAIK The current usecase of ipfs dag {ex,im}port is for cold storage of your files, then it makes a lot of sense to pin them by default because you don't want whatever you are importing to be GCed middle of the import andn this pretty much break this.

With IPIP402 & friends we will now produce car files which are incomplete and or do not have roots.
But is anyone gonna do curl $IPIP402_REQUEST | ipfs dag import ?

@rvagg
Copy link
Member Author

rvagg commented Jun 13, 2023

But is anyone gonna do curl $IPIP402_REQUEST | ipfs dag import

Yep, consider the workflow options with a CAR in hand - it has to go in to some other toolchain to be manipulated. We've made go-car happy with partial DAGs so you can pass it through car extract, but you should equally be able to stick it into your Kubo node and cat or whatever else you want with the data you've fetched.

Or perhaps you're doing some incremental DAG work and building up a larger DAG using Kubo as your blockstore, (I'm actually doing this right now with this exact workflow, building up Wikipedia by grabbing pieces of it and putting them into Kubo to help it along since it seems to have trouble walking the entire DAG for me).

And besides, if the behaviour of Kubo dag import is to pin and therefore fetch the DAG, as suggested by #9765, then there's a nice little attack vector if you can pull it off - give someone a small CAR to import that also happens to have the wikipedia root in the list of roots and off your Kubo node goes, silently building up your blockstore without you knowing why.
Alternatively, if the behaviour is to fail because it's an incomplete DAG (this is what the docs currently suggest the bahaviour is) then that's also not great if I'm able to get incomplete DAGs from the trustless gateway protocol which Kubo itself serves.

@rvagg rvagg force-pushed the rvagg/dag-import-nopindefault branch from 2c84005 to 02e8372 Compare June 13, 2023 05:29
@rvagg rvagg force-pushed the rvagg/dag-import-nopindefault branch from 02e8372 to b30f46b Compare June 13, 2023 05:29
@rvagg
Copy link
Member Author

rvagg commented Jun 13, 2023

thanks for the details @hacdias, I didn't know those reports were available, I've been trying to search through the logs in Actions but I can see now there's links to those reports in the "Summary" page, that's quite nice!
green now, I dealt with the two that were coming from this change and the prometheus one seems to have fixed itself, flaky I guess.

@Jorropo
Copy link
Contributor

Jorropo commented Jun 13, 2023

Good points.

We have different user stories here that have strict behaviour incompatible with each other:

  • If you are ipfs dag importing your own cold backups having it fetch over the network missing blocks is not that bad and probably good, (you shouldn't have missing blocks and if you do it's probably bitrot in the multihash digest or in the content of one block).
  • Someone wants to ipfs cat some path of the wikipedia root they downloaded from somewhere. Here you don't want it pinned and you really don't want kubo to start fetching all the blocks.

This pull request is a silent behaviour change someone could ipfs dag import some cold storage, remove the cold storage because they now trust Kubo and it gets GCed.

I don't think anyone relies on the fetch missing blocks behaviour, so we could remove that.
So then what about doing pinning based on the completness of the graph received (with an options to make it required as either partial or not for users who want strict error checking).

So:

$ curl https://saturn.ms/ipfs/cid/dir/file.jpg?format=car | ipfs dag import
Partial Data detected, blocks have been imported but not pinned.
$ ssh [email protected] "ipfs dag export QmcniBv7UQ4gGPQQW2BwbD4ZZHzN3o3tPuNLZCbBchd1zh" | ipfs dag import
Pinned root	QmcniBv7UQ4gGPQQW2BwbD4ZZHzN3o3tPuNLZCbBchd1zh	success

@rvagg
Copy link
Member Author

rvagg commented Jun 13, 2023

yeah, that's not a terrible compromise .. way more than I wanted to bite off with this PR though!

@lidel
Copy link
Member

lidel commented Jun 13, 2023

Some notes:

  • confirmed Kubo is never fetching missing blocks – dag import is hardcoded to offline mode – see CAR import should not pin roots by default (ipfs dag import --pin-roots=false) #9765 (comment)
  • improved the error messages and fixed false-positive exit code on CLI when pinning fails, but extracted that to separate PR: fix(cmd): useful errors in dag import #9945
  • curl | ipfs dag import is how you can get content-addressed data out of prison^Wservices that do not speak DHT or Bitswap. We will see more and more of this, and dedicated command or improved defaults should reduce pain
    • low level pins are hard to track and manage, and imo low level apis like block or dag will cause less surprises if we always pin as opt-in. I've seen people leaking disk space because they forgot to pass --pin-roots=false when importing ephemeral data.

Change dag import or add block import ?

user stories here [..] have strict behaviour incompatible with each other

This is a good point.

We can either accept this PR to unify pinning behavior in dag put|import, and deal with breaking change, or add a separate ipfs block import that does not do any IPLD/DAG/roots/pinning and just does batch equivalent of block put for every block found in a CAR stream.

Both sgtm. Anyone has any preference?

@rvagg
Copy link
Member Author

rvagg commented Jun 14, 2023

non-maintainer opinion - more API surface area is not better, just more things to test and maintain. I think if the main use-case that involves pinning is the cold-storage one then an explicit ask to --pin-roots is not such a bad option.

I think I prefer @Jorropo's suggestion of only pinning if the root(s) references a complete DAG in the file. I just don't want to have to do that work tbqh!

@hacdias
Copy link
Member

hacdias commented Jun 14, 2023

My 2 cents: as @rvagg mentioned, I don't think it's a good idea to add yet another command when this can easily be solved with the flag. In addition, I think @Jorropo's suggestion of pinning the roots by default iff the full DAG is available is a sensible solution. If no one's available to do that at the moment, I would accept the PR in the current state with the breaking change.

lidel added 3 commits June 14, 2023 21:13
this adds basic regression test that guards behavior
around partial cars with or without pinning
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've tested this with IPIP-402 + #9945 and confirmed dag import command now correctly errors on partial cars:

$ curl -v 'http://127.0.0.1:8080/ipns/docs.ipfs.tech?format=car&dag-scope=entity' > ./partial-entity.car # Kubo 0.21.0-rc1 with IPIP-402 (only root block of unixfs dir)
$ ipfs dag import --stats --pin-roots=true ./partial-entity.car
Error pinning	QmPDC11yLAbVw3dX5jMeEuSdk4BiVjSd9X87zaYRdVjzW3	FAILED: block was not found locally (offline): ipld: could not find QmPDvrDAz2aHeLjPVQ4uh1neyknUmDpf1GsBzAbpFhS8ro
Imported 1 blocks (1618 bytes)
[exit code 1]

Added some regression tests in 20b7e18 and bit longer changelog in e2ac0d4 – should be good enough for now.
Thank you for the push on this @rvagg!

Merging after CI passes (this is net-better than previous state imo – longer rationale in #9765 (comment))

@lidel lidel enabled auto-merge (squash) June 14, 2023 20:39
@lidel lidel mentioned this pull request Jun 14, 2023
@lidel lidel merged commit b685355 into master Jun 14, 2023
@lidel lidel deleted the rvagg/dag-import-nopindefault branch June 14, 2023 20:45
@hacdias hacdias mentioned this pull request Jun 15, 2023
hacdias pushed a commit that referenced this pull request Jun 20, 2023
* feat!: dag import - don't pin roots by default

Fixes: #9765

* test(ipip-402): dag import

this adds basic regression test that guards behavior
around partial cars with or without pinning

* docs(ipip-402): ipip and dag import changelog

---------

Co-authored-by: Marcin Rataj <[email protected]>
hacdias pushed a commit that referenced this pull request Jun 20, 2023
* feat!: dag import - don't pin roots by default

Fixes: #9765

* test(ipip-402): dag import

this adds basic regression test that guards behavior
around partial cars with or without pinning

* docs(ipip-402): ipip and dag import changelog

---------

Co-authored-by: Marcin Rataj <[email protected]>
@BigLep BigLep mentioned this pull request Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

CAR import should not pin roots by default (ipfs dag import --pin-roots=false)
4 participants