Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to tell link system storage is trusted and we can skip hash on read #149

Merged
merged 1 commit into from
Mar 24, 2021

Conversation

hannahhoward
Copy link
Collaborator

Goals

When the underlying storage system is local, we may want to skip hash on read to increase performance. This adds a boolean value that tells the LinkSystem to skip hash checking when loading nodes.

Extra bonus: this will be fine as an alternative to the bytesBuffer optimization!

@warpfork
Copy link
Collaborator

warpfork commented Mar 5, 2021

Devils advocate / another option to consider:

What if BlockReadOpener was instead defined to return a boolean: func(LinkContext, Link) (content io.Reader, plzTrustMe bool, err error)?

  • Would still hit the goal of making hashing disableable
  • Lets the storage layer express whether or not it believes its content is already validated -- which means we could be loading part of a walk from trusted storage, and other part from not-yet-trusted network content, and it all sorts itself out
  • Still lets the storage functions be free of the complexity of having to deeply understand hashing (all currently drafted approaches seem to do this, just remembering it outloud for clarity).

Would this be relevant? Do we have mixed trust walks like that (or might we, in the future)?

Would it make sense to combine these things (maybe have LinkSystem contain a flag that must be one of {FollowStorageTrustHint|AlwaysHashReads|NeverHashReads})?

Thinking out loud at the moment, do not yet have strong opinions.

@warpfork
Copy link
Collaborator

warpfork commented Mar 5, 2021

Should we have a way to tell a storage system that the io.Reader it returned gave us bollocks that failed hash verification?

I don't know if that's come up yet, but I could imagine that being relevant information if the storage system is network-backed and involves a piece of code with enough agency to want to Do Something in the network protocol based on higher level errors like that.

@hannahhoward
Copy link
Collaborator Author

In short, I think the cases where we need to be making a distinction based on CID as opposed to the storage layer are very few. More likely we will have different storage layers for which we will construct different link systems. I think this a much less cumbersome solution that solves the main problem for now.

@warpfork
Copy link
Collaborator

warpfork commented Mar 8, 2021

Distinction based on CID: agree, rare to never. Let's go with 'never'.

Working with different storages: Less sure about this. Sometimes, true, one can just use different LinkSystem.

I think sometimes using a whole different LinkSystem won't suit though. For example: if I'm doing graphsync-y things... I'm evaluating a Selector, so I'm in the middle of one big traversal.Walk call -- so I have one LinkSystem of note. And in grahpsync it would be natural that some of the data I'm loading would be being streamed to me from a network peer (so it needs hashing), while maybe some of the data is blocks I already had locally, so I might not want to rehash them.

I think to be able to compose the two different storages in that case would be solvable in one of two ways:

  1. make the hash verification the problem of the storage reader function. (Not great: pushes complexity to the storage function.)
  2. let the storage function return info that hints whether this storage should be considered trustworthy, and let the LinkSystem decide what to do. (Not great: expands the size of the storage reader function interface slightly.)

Between those two, the latter seems less bad to me.

@warpfork
Copy link
Collaborator

warpfork commented Mar 8, 2021

I think I'd be game to merge this approach with the bool in LinkSystem, too, if we want to try it immediately. (In terms of how this would box us in and create more interface churn, it's really mild: As long as the default is "yes, hash", and any future change we could imagine doing also defaults to "yes, hash", then the changes only affect people doing Interesting things, and those people are probably interested in those changes so they should be minimally problematic.)

But I'm still grinding through this train of thought, because if we do end up adding some more hint bools to the BlockReadOpener function interface, it would probably also be ideal to do those now-ishly, since #143 will already be causing an interface change there.

Base automatically changed from linksystem to master March 12, 2021 16:50
Have boolean that specifies whether the storage is trusted. if it is, skip hashing on reads
@hannahhoward hannahhoward force-pushed the feat/configure-hash-on-load branch from df2f100 to 6a262a3 Compare March 24, 2021 01:28
@hannahhoward
Copy link
Collaborator Author

@warpfork this is rebased -- can we merge it? I removed the byte buffer optimization. which ultimately makes this code simpler.

I think the number of cases where you're data is untrusted but also still being read locally such that you have it in a bytes.Buffer is very rare, and I think we should just live without for now.

@warpfork
Copy link
Collaborator

With the forward-looking note I still bet way north of 50/50 on if we will still end up wanting to add an info bit about this to BlockReadOpener somehow in the future -- yeah, I'm also good with this. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants