-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
estimate file car size #59
base: main
Are you sure you want to change the base?
Conversation
…he car file representing a unixfs file of a given size will take Fix #58
My initial response to this is - firstly in my mental hierarchy this doesn't belong here, but it could belong in go-car if it was made more configurable. go-car is a consumer of go-unixfsnode, which sits lower in the stack, and it already has precedent of having unixfs-specific code and I think that'd be OK to extend that. Making go-unixfsnode have CAR-specific code seems like mixing up the stack hierarchy a little too much? I'd like to move https://github.com/filecoin-project/lassie/tree/main/pkg/verifiedcar into go-car, as a separate utility package, this could be similar, doing a very specific job. Alternatively just new repo in either ipld or ipfs orgs would be fine. Regarding pb sizing, protowire has some decent protobuf sizing utilities, do they help here? https://github.com/ipld/go-codec-dagpb/blob/master/marshal.go#L129-L138 Regarding interface and making it more generic, how about something like what we have in https://github.com/filecoin-project/lassie/tree/main/pkg/verifiedcar where you make the config and use (and reuse) it to estimate sizes: unixfscar.SizeEstimateConfig{
BlockSize: 1024 * 128, // defaults to chunk.DefaultBlockSize if 0
LeafLinkPrototype: ...,
FileLinkPrototype: ...,
}.Estimate(dataSize) The default (
... or just maybe |
Oh, going back and reviewing #58, the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Will 🎉
Left questions and minor suggestions, overall LGTM.
In terms of where the functionality would live, I have no preference.
@@ -57,6 +58,91 @@ func BuildUnixFSFile(r io.Reader, chunker string, ls *ipld.LinkSystem) (ipld.Lin | |||
} | |||
} | |||
|
|||
// EstimateUnixFSFile estimates the byte size of the car file that would be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why the word "estimate"? I read that as: the actual size may differ. Is that right? If so, would it make sense to update the godoc to elaborate on the discrepancy?
If not, I recommend avoiding this terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only real discrepancy in my head at this point is that depending on your actual data, the resulting car may have de-duplicated blocks so may be smaller than expected
} | ||
links = nxtLnks | ||
} | ||
fmt.Printf("estimated %d intermeidate nodes\n", icnt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use logger?
ls.StorageWriteOpener = storage.OpenWrite | ||
|
||
icnt := 0 | ||
for len(links) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to update the godoc of this function with the cost complexity of estimating size vs writing the CAR out into a temporary file and getting the size that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the extent of what we can really say is that the size calculation doesn't need memory, but will do the CPU work for generation of intermediate blocks.
return bwc(lnk) | ||
}, err | ||
} | ||
rt, _, err := builder.BuildUnixFSFile(bytes.NewReader(b), "", &ls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare total size returned with estimated size?
@masih how would you feel about adapting this method directly into the client code you're working on? do you think it's preferable to find a place for it in car / here instead? |
Co-authored-by: Masih H. Derkani <[email protected]>
This seems like a path of least resistance. 👍 In the long run I agree with Rod, in that we probably want to either adopt it in go-car, or set up a separate repo/module to offer unixfs CAR serialisation capabilities. |
fix #58
I wonder having put this together if we really want this as part of this package / core interface. It's very finicky around:
Also note that protobuf encoding because of it's use of un-aligned varints is very difficult to get accurate without actually doing the work. That said the number of intermediate nodes isn't too bad in any realistic case, so that's probably an okay tradeoff here.
It might be better to pull to a client implementation rather than here?