-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Estuary <> Boost CommP mismatch investigation #673
Comments
Lodge just provided another, much smaller, failing CAR, built from
So ignoring the empty identity CID listed first, the second one has the goodies:
|
Oh, I think maybe this data is coming out of Zarr:
So I think we might have a have a Python dag-pb implementation to find and help fix. |
@rvagg and I looked at boost for the first next step:
we're pretty convinced it's using a current go-merkledag.
|
OK, I've spent far too long on this today (including writing a test case to transfer a Boost libp2p deal) but I've finally figured it out. A bunch of functions on So, for now this just reinforces the need to do item 2 above. I think maybe there's extra steps for go-merkledag but I'll have to think on that, because we've kind of left it in a half-way state like this. You can call |
Ref: #673 (comment) Ideally we can remove go-merkledag entirely from here because most (all?) of the deal-making infrastructure uses go-codec-dagpb + go-ipld-prime traversals and post-decode link-reordering is a subtle difference that impacts CAR ordering for non-spec-compliant DAG-PB blocks.
Ref: #673 (comment) Ideally we can remove go-merkledag entirely from here because most (all?) of the deal-making infrastructure uses go-codec-dagpb + go-ipld-prime traversals and post-decode link-reordering is a subtle difference that impacts CAR ordering for non-spec-compliant DAG-PB blocks.
* fix: don't allow go-merkledag to reorder loaded links Ref: #673 (comment) Ideally we can remove go-merkledag entirely from here because most (all?) of the deal-making infrastructure uses go-codec-dagpb + go-ipld-prime traversals and post-decode link-reordering is a subtle difference that impacts CAR ordering for non-spec-compliant DAG-PB blocks. * test: add test for CarOffsetWriter dag traverse order
Thanks so much for digging into this @rvagg |
That 100% linux2ipfs output. (see the 2MiB leaves, I'm the only one doing that afaik). Yes it has seen use in the wild (there is me uploading linux repos, I've seen a few scientists uploading datasets). Linux2ipfs also set the useragent, it might be smart logging that. I'll fix linux2ipfs. |
* fix: don't allow go-merkledag to reorder loaded links Ref: #673 (comment) Ideally we can remove go-merkledag entirely from here because most (all?) of the deal-making infrastructure uses go-codec-dagpb + go-ipld-prime traversals and post-decode link-reordering is a subtle difference that impacts CAR ordering for non-spec-compliant DAG-PB blocks. * test: add test for CarOffsetWriter dag traverse order
We've back ported the immediate patch from #680 to https://github.com/filecoin-project/boost/releases/tag/v1.0.1. This will provide a minimal dependency update patch for Estuary to be able to integrate more quickly to resolve the immediate problem. Next steps are to push ipld/go-car#291 forward for full resolution of this. |
I don't think it's doing anything with your original IPLD data, so regardless of CAR input, this is about the encoded DAG-PB blocks, since they're not properly sorting links, we're ending up with this situation with the "in what order do I traverse the links" is coming out differently depending on the path that's chosen. i.e. best to stick to the spec and sort links (easiest way would be to do something like Again though, our systems should be doing the right thing here and have one way of interpreting DAG-PB, so it's our fault for not dealing properly with the flexibility we've left in the spec (we've intentionally not dictated that unsorted |
I implemented this exact fix yesterday 🙂 |
When decoding a badly serialised block with Links out of order, don't sort the list until we receive an explicit mutation operation. This ensures stable DAG traversal ordering based on the links as they appear in the serialised form and removes surprise-sorting when performing certain operations that wouldn't be expected to mutate. The pre-v0.4.0 behaviour was to always sort, but this behaviour wasn't baked in to the dag-pb spec and wasn't built into go-codec-dagpb which now forms the backend of ProtoNode, although remnants of sorting remain in some operations. Almost all CAR-from-DAG creation code in Go uses go-codec-dagpb and go-ipld-prime's traversal engine. However this can result in a different order when encountering badly encoded blocks (unsorted Links) where certain intermediate operations are performed on the ProtoNode prior to obtaining the Links() list (Links() itself doesn't sort, but e.g. RawData() does). The included "TestLinkSorting/decode" test is the only case that passes without this patch. Ref: ipld/ipld#233 Ref: filecoin-project/boost#673 Ref: filecoin-project/boost#675
When decoding a badly serialised block with Links out of order, don't sort the list until we receive an explicit mutation operation. This ensures stable DAG traversal ordering based on the links as they appear in the serialised form and removes surprise-sorting when performing certain operations that wouldn't be expected to mutate. The pre-v0.4.0 behaviour was to always sort, but this behaviour wasn't baked in to the dag-pb spec and wasn't built into go-codec-dagpb which now forms the backend of ProtoNode, although remnants of sorting remain in some operations. Almost all CAR-from-DAG creation code in Go uses go-codec-dagpb and go-ipld-prime's traversal engine. However this can result in a different order when encountering badly encoded blocks (unsorted Links) where certain intermediate operations are performed on the ProtoNode prior to obtaining the Links() list (Links() itself doesn't sort, but e.g. RawData() does). The included "TestLinkSorting/decode" test is the only case that passes without this patch. Ref: ipld/ipld#233 Ref: filecoin-project/boost#673 Ref: filecoin-project/boost#675
Hi everyone, I'm just realising that you are talking about a car-file that I created ( Could someone summarise for me, which consequence this issue has for my currently uploaded cars and which for those that I would like to upload in the future ( with the same structure and without the fix of linux2ipfs ). |
@observingClouds the bug here is in our inability to deal with badly formed, but not spec-breaking data. Our spec for dag-pb doesn't dictate that unsorted links fail to decode, so if an encoder breaks the encoding-side contract and doesn't sort then we still should be able to deal with it (we have similar rules in other places and for other codecs - strict encode but somewhat sloppy decode). So the fact that this was even noticed is a problem. It'll be fixed with ipfs/go-merkledag#87, when that makes it through the stack. For now, the workaround in #675 for Boost should solve it for existing data. i.e. not to worry about the existing data, our inability to deal with it is our problem, not yours. I imagine they'll start properly flowing through the system properly again with everything patched. I'm assume Estuary will be ensuring replicas for that data will be stored. It would be ideal for new data to use conforming codecs, so these edge cases are less likely, but they're just things we need to fix. |
Thanks @rvagg for the detailed explanation! This is very helpful! |
When decoding a badly serialised block with Links out of order, don't sort the list until we receive an explicit mutation operation. This ensures stable DAG traversal ordering based on the links as they appear in the serialised form and removes surprise-sorting when performing certain operations that wouldn't be expected to mutate. The pre-v0.4.0 behaviour was to always sort, but this behaviour wasn't baked in to the dag-pb spec and wasn't built into go-codec-dagpb which now forms the backend of ProtoNode, although remnants of sorting remain in some operations. Almost all CAR-from-DAG creation code in Go uses go-codec-dagpb and go-ipld-prime's traversal engine. However this can result in a different order when encountering badly encoded blocks (unsorted Links) where certain intermediate operations are performed on the ProtoNode prior to obtaining the Links() list (Links() itself doesn't sort, but e.g. RawData() does). The included "TestLinkSorting/decode" test is the only case that passes without this patch. Ref: ipld/ipld#233 Ref: filecoin-project/boost#673 Ref: filecoin-project/boost#675
When decoding a badly serialised block with Links out of order, don't sort the list until we receive an explicit mutation operation. This ensures stable DAG traversal ordering based on the links as they appear in the serialised form and removes surprise-sorting when performing certain operations that wouldn't be expected to mutate. The pre-v0.4.0 behaviour was to always sort, but this behaviour wasn't baked in to the dag-pb spec and wasn't built into go-codec-dagpb which now forms the backend of ProtoNode, although remnants of sorting remain in some operations. Almost all CAR-from-DAG creation code in Go uses go-codec-dagpb and go-ipld-prime's traversal engine. However this can result in a different order when encountering badly encoded blocks (unsorted Links) where certain intermediate operations are performed on the ProtoNode prior to obtaining the Links() list (Links() itself doesn't sort, but e.g. RawData() does). The included "TestLinkSorting/decode" test is the only case that passes without this patch. Ref: ipld/ipld#233 Ref: filecoin-project/boost#673 Ref: filecoin-project/boost#675
When decoding a badly serialised block with Links out of order, don't sort the list until we receive an explicit mutation operation. This ensures stable DAG traversal ordering based on the links as they appear in the serialised form and removes surprise-sorting when performing certain operations that wouldn't be expected to mutate. The pre-v0.4.0 behaviour was to always sort, but this behaviour wasn't baked in to the dag-pb spec and wasn't built into go-codec-dagpb which now forms the backend of ProtoNode, although remnants of sorting remain in some operations. Almost all CAR-from-DAG creation code in Go uses go-codec-dagpb and go-ipld-prime's traversal engine. However this can result in a different order when encountering badly encoded blocks (unsorted Links) where certain intermediate operations are performed on the ProtoNode prior to obtaining the Links() list (Links() itself doesn't sort, but e.g. RawData() does). The included "TestLinkSorting/decode" test is the only case that passes without this patch. Ref: ipld/ipld#233 Ref: filecoin-project/boost#673 Ref: filecoin-project/boost#675 This commit was moved from ipfs/go-merkledag@48c7202
Some notes on the investigation so far, I know this is a lot of blather and you can skim down through to the last half to see the juiciest bits. Not completely solved but I think I'm close and I believe the actions I've listed at the bottom will get us to resolution.
Previous threads:
#ecosystem-dev
channel has threads (here & here)Thankfully Alvin (via stuberman and lodge) was able to provide a DAG that is failing so we can dig deeper into the nature of the failure. A ~32G DAG hanging off
bafybeietzjzc4vudlevv6k6sxdixhp5nnmfblyjhqheyjyd4d3uluvqdgm
. Looking at the version thatipfs dag export
(proper exhaustive selector export, what we would expect for a well-formed CAR) gives compared to the one that Boost apparently has on its end where it's reporting the mismatch we can see:The ordering problem can be seen just by looking at the first few blocks. Here's what we expect (
ipfs dag export
):Here's what we get in Boost:
This list of expected links can be confirmed by just looking at the root block's links with
ipfs dag get bafybeietzjzc4vudlevv6k6sxdixhp5nnmfblyjhqheyjyd4d3uluvqdgm | jq .Links[].Hash[] | head
.The second and third block are the same in both lists and then it diverges. Both of those initial links are just Bytes, they have no links, so this isn't a case of a traverser deciding to go down a different pathway, they should just be walking those links in the root block in order.
I wrote a simple program to "traverse" these links in the various ways that may matter, just from that root block, and keep on getting the same, stable ordering:
DecodeProtobuf
Most of the tooling in the path to make CARs uses go-ipld-prime's traversals which in turn will be relying on go-codec-dagpb. But there is a dependency in boost for a custom branch of go-car @ ipld/go-car#290 that uses go-merkledag's
Walk
and other legacy pieces to load and decode blocks. So there's a suspicion that the use of the legacy stack may be involved here.In version 0.4.0 of go-merkledag, the underlying mechanics of protobuf decode were swapped out to use go-codec-dagpb, so since that version we should even have the same decoding path.
BUT prior to 0.4.0 it turns out we had a sneaky decode-sort of links going on whenever you decode a DAG-PB block. This is not something that we factored in to the DAG-PB spec or go-codec-dagpb—links are only sorted on encode. And in a go-ipld-prime world, your Node decode ordering will dictate your traversal ordering. I'm going to add some clarifications to the spec about this @ ipld/ipld#233.
This shouldn't be a problem under normal circumstances, but we also have to deal with badly, or unsorted DAG-PB Links since we're not being strict about rejecting blocks with unsorted Links lists. And, it turns out that the failure case we have here is one of those. If we pull out the
Name
for each of the links that appear in the first blocks past the root in the CAR we can see what's going on:ipfs dag export
:Boost:
Re-running my test program against [email protected] and doing a
Walk
produces the same order we're seeing out of Boost.Unfortunately I haven't figured out why Boost is doing this sorting. Even in v1.0.0 I can only see it pulling in >v0.4.0 versions of go-merkledag, and I've confirmed that this effect only appears for versions <v0.4.0. Perhaps there's some dependency jumbling that's going on to bring it in.
I see three things to do next:
CarOffsetWriter
but some other CAR creation path I'm not seeing?)CarOffsetWriter
here with that. We really shouldn't be using go-merkledag for these kinds of things, we've been using ipld-prime traversals for CAR creation since the Filecoin launch (primarily through go-car'sSelectiveCar
).The text was updated successfully, but these errors were encountered: