Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO: DAG walk for Block API #118

Closed
rvagg opened this issue Sep 13, 2021 · 3 comments · Fixed by #127
Closed

TODO: DAG walk for Block API #118

rvagg opened this issue Sep 13, 2021 · 3 comments · Fixed by #127
Assignees
Labels
exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue

Comments

@rvagg
Copy link
Member

rvagg commented Sep 13, 2021

We've had APIs for DAG walks in various iterations but don't have a generic one for the js-multiformats stack and it would be a nice fit in the core package.

e.g. https://github.com/ipld/js-block/blob/master/reader.js#L4 was for the short-lived js-block experiment that predates js-multiformats, and I think there's a library that did it for the older interfaces too. Not a hard algorithm, but the complicating factor now is that we don't want to bundle codecs in the core API—the user needs to provide them. So the API might be a little gnarly with an array of codecs, or we could move forward with something like #38 to make this easier/cleaner.

https://github.com/ipfs/js-ipfs/blob/6a2c710e4b66e76184320769ff9789f1fbabe0d8/packages/ipfs-core/src/components/dag/export.js#L82-L107 was the last implementation of this done. Any implementation here should aim to replace most of that code with a call into this library.

ipfs-car would also be a logical consumer of such an API, to help make deterministic CARs: storacha/ipfs-car#76

Questions around ordering will need to have some clarity when comparing to Go APIs. Such DAG walks using go-ipld-prime will strictly use map field ordering as they appear in the encoded block. We have a little more difficulty in JS since we have to rely on Object property ordering to do this work. It would be good to test and document any potential footguns for users wrt compatibility and "determinism" related to this.

@rvagg rvagg self-assigned this Sep 13, 2021
@rvagg rvagg added exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue labels Nov 12, 2021
@rvagg
Copy link
Member Author

rvagg commented Nov 17, 2021

Thinking through this more - I'd like to get closer to go-ipld-prime selectors functionality, without going full selectors. The most commonly used selector is "explore all" which simply walks a DAG, visiting every block, but not matching any nodes. You provide it with a loader, and then watch that loader for block loads and those loads tell you the order of block traversal. We typically use it for complete-DAG CAR creation (in go-ipfs' dag export - see ipfs/kubo#8506 and also in Lotus for all the CAR generation and CommP calculation).

So we already have Block#links() in here and we use that to tell us the order to visit, all we need is a function that does a walk and takes a root and a loader callback that provides us with Blocks from somewhere. So something like this:

async Block.exploreAll(cid:CID, loaderFn:(cid)=>Promise<Block>)

And your loaderFn might look something like this (copying the js-ipfs dag export code for this):

const writer = ... // likely a CAR output writer
const loaderFn = async (cid) => {
  const bytes = await repo.blocks.get(cid, options) // get the bytes from somewhere, the loader's main concern
  const codec = await codecs.getCodec(cid.code) // let the loader do the decoding, `Block` doesn't have codecs
  if (codec) {
    const block = Block.createUnsafe({ bytes, cid, codec })
    await writer.write(block) // this is where we're doing double-duty, a load is telling us that this block is _next_ in the DAG order
    return block // give the block to `exploreAll()` to keep walking
  } else if (!NO_LINKS_CODECS.includes(cid.code)) {
    throw new Error(`Can't decode links in block with codec 0x${cid.code.toString(16)} to form complete DAG`)
  }
  // not quite sure what to do here, but we know it doesn't contain links so we probably return some faux block with
  // the right bytes and CID but not the right contents, we can defer this decision
}

await Block.exploreAll(root, loaderFn)

@patrickwoodhead
Copy link
Collaborator

patrickwoodhead commented Nov 17, 2021

To check if I am on the right track, something like this?

/**
 * @template T
 * @param {Object} options
 * @param {CID} options.cid
 * @param {(cid: CID) => Promise<Block<T>>} options.loaderFn
 */
const exploreAll = async ({ cid, loaderFn }) => {
  const block = await loaderFn(cid)

  for (const link of block.links()) {
    const linkCid = link[1]
    if (!(linkCid.equals(cid))) {
        await exploreAll({ cid: linkCid, loaderFn })
    }
  }
}

This function will take as input a CID and a loaderFn. It will then leave the responsibility of getting a block by its CID to the loader function.

Once it has got the block, it iterates through the links in the block using the links() iterator.

N.B. There is an assumption here that the loaderFn is trying to work its way down a DAG and will therefore run out of block.links as it gets to leaf nodes. However, there are loader functions where the above won't terminate. But since the above exploreAll function is just a utility library function, perhaps it is not its responsibility to ensure that the process terminates?

Another thing to note with the above function is that it will load blocks right down to the leaves below the first child of the root before loading the second child of the root node.

@rvagg
Copy link
Member Author

rvagg commented Nov 18, 2021

Yes, this is close, some thoughts:

  1. Yes, it's not necessary for this function to decide when it ends, it just wants to exhaust the graph. There's a "SkipMe" functionality in go-ipld-prime that lets the loader communicate that the walk should skip over a block, but that's not important here just now for this first pass. Let it be exhaustive.
  2. Having said "exhaustive", for most practical purposes you don't want to re-visit blocks you've already been to, and this can be common in DAGs. So for now the default should be to skip duplicate blocks and this can be embedded in this function. See https://github.com/ipfs/js-ipfs/blob/6a2c710e4b66e76184320769ff9789f1fbabe0d8/packages/ipfs-core/src/components/dag/export.js#L91 for the seen Map that lets you check whether a CID has been loaded before. That same pattern can be used here (you could defer to a second function, or since you're using an options object you could check for its existence and assume that if it doesn't exist that it's the top-level call and set up a new one).
  3. !(linkCid.equals(cid)) shouldn't be necessary - if that's ever true then I think we have a problem with links() because a block shouldn't be able to link to itself. And the seen list should take care of duplicates.
  4. Re naming:
  5. Since we're going with an options object (good choice given the patterns already in this lib) - let's shorten loaderFn to load and
  6. Let's call this walk and put it in a new file called traverse.js and wire that up as an export in package.json like the block export. That way we're closer to the go-ipld-prime API and it gives us scope to beef this out with selectors and other traversal modes.

A test should involve a graph of blocks that include at least one duplicate link to a block, maybe from different places, even as simple as this:

a -> b
a -> c
b -> d
b -> d
c -> d

And then you'd want to check that you strictly loaded 4 blocks, no repeats, in the depth-first order: a, b, d, c

rvagg pushed a commit that referenced this issue Nov 29, 2021
Closes: #118

Basic traversal functionality for deterministic DAG walking with no
repeat block visits and support for block skipping.

User supplies a block loader, which can be used to watch the block
ordering of the walk.
rvagg pushed a commit that referenced this issue Nov 29, 2021
Closes: #118

Basic traversal functionality for deterministic DAG walking with no
repeat block visits and support for block skipping.

User supplies a block loader, which can be used to watch the block
ordering of the walk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants