Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DAGService.GetLinks() method and use it in the GC and elsewhere. #3255

Merged
merged 3 commits into from
Oct 8, 2016

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Sep 24, 2016

This method will use the (also new) LinkService if it is available to
retrieving just the links for a MerkleDAG without necessary having to
retrieve the underlying block.

For now the main benefit is that the pinner will not break when a block
becomes invalid due to a change in the backing file. This is possible
because the metadata for a block (that includes the Links) is stored
separately and thus always available even if the backing file changes.

For context (and test cases) please see the original commits: 5a73226 8b396fd

@Kubuxu Kubuxu added the status/in-progress In progress label Sep 24, 2016
@kevina
Copy link
Contributor Author

kevina commented Sep 24, 2016

Part of #2634.

@whyrusleeping
Copy link
Member

It doesnt appear that the linkservice interface has anything that implements it, and doesnt appear to be used anywhere

@kevina
Copy link
Contributor Author

kevina commented Sep 26, 2016

Right. It doesn't have a use yet outside of the filestore. This commit has an implementation 5a73226.

@kevina
Copy link
Contributor Author

kevina commented Sep 28, 2016

@whyrusleeping, if #2634 is going to get merged within a reasonable amount of time, I am okay with this P.R. not getting merged at the same time as #2634. I mainly separated it out for easier reviewing.

@whyrusleeping
Copy link
Member

@kevina I think that we should change the linkservice interface so that the existing dagservice implements it. That way any thing that just needs a link service can still have the dag service passed to it. And any place that needs both should just have the dagservice passed in. Then, for the filestore, we can make a small wrapper around the current dagservice code that overrides the GetLinks method to use the filestore backing instead.

That way, it makes it easy for us to do other things later, like plug in a graph database to help resolve links.

@whyrusleeping
Copy link
Member

So what i'm getting at here, is that we want the linkservice and the dagservice to ideally be exposed via the same object. Where the linkservice interface is just a limited subset of the dagservices interface. With them separate, a few things start to feel weird. I would never want to call GetLinks on the dagservice if i have a linkservice would I? but now i have to check if i have a linkservice, then if not, call out to the dagservice.

So for the purpose of this PR, we should just be adding the linkservice interface with GetLinks, and adding that same method to the existing dagservice interface. That way we can pass the dagservice to any function that requires only a linkservice. Later, when we have different linkservice implementations, we can make a wrapper object for the dagservice like:

type FancyDagServiceWithLinkMagic struct {
    DAGService
    linkStuff *linkResolver
}

func (a *FancyDagServiceWithLinkMagic) GetLinks(....) (Link, error) {
    return a.linkStuff.GetLinks(...)
}

And upon node construction, use that object as the nodes DAG. That way everbody basically gets the extra fancy link resolver boost for free.

@kevina
Copy link
Contributor Author

kevina commented Oct 1, 2016

@whyrusleeping @Kubuxu I just pushed a commit that implemented what we seamed to agree on over IRC.

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sold on the GetOffline thing. I think that its probably okay to just expect the user to pass the right thing in. I get the appeal of having the type system check that for us, but i think it might be cluttering the interfaces a bit too much.

cc @Kubuxu @diasdavid @jbenet for their thoughts as well

return node.Links, nil
}

func (n *dagService) GetOfflineLinkService() LinkService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be moved to the main DAGService interface. There are many situations where we would want to use this to get an offline dagservice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the return type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the actual implementation is returning an offline DAGService, which is something that we would want to have available.

I'm thinking that when we need an offline linkservice, we could call dag.GetOffline() (which would return a DAGService) and then simply pass that into the function requiring the offline linkservice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way to enforce type safety. Otherwise we could be passed a LinkService and get a DAGService. If you really don't like that I can change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type safety is good, lets move forward with this

@@ -366,19 +393,19 @@ func legacyCidFromLink(lnk *Link) *cid.Cid {
// EnumerateChildren will walk the dag below the given root node and add all
// unseen children to the passed in set.
// TODO: parallelize to avoid disk latency perf hits?
func EnumerateChildren(ctx context.Context, ds DAGService, root *Node, visit func(*cid.Cid) bool, bestEffort bool) error {
for _, lnk := range root.Links {
func EnumerateChildren(ctx context.Context, ds LinkService, links []*Link, visit func(*cid.Cid) bool, bestEffort bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this just take a since *cid.Cid as its input instead of an array of links. Makes it easier to call in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will look into it.

unlocker := bs.GCLock()

bsrv := bserv.New(bs, offline.Exchange(bs))
ds := dag.NewDAGService(bsrv)
ls = ls.GetOfflineLinkService()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i see why this was on the LinkService and not the DAGService. Hrm... thats tricky.

if err != nil {
return err
}

// EnumerateChildren recursively walks the dag and adds the keys to the given set
err = dag.EnumerateChildren(ctx, ds, nd, func(c *cid.Cid) bool {
// EnumerateChildren recursively walks the dag and adls the keys to the given set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will fix the comment.

@@ -521,19 +521,19 @@ func (p *pinner) PinWithMode(c *cid.Cid, mode PinMode) {
}
}

func hasChild(ds mdag.DAGService, root *mdag.Node, child key.Key) (bool, error) {
for _, lnk := range root.Links {
func hasChild(ds mdag.LinkService, links []*mdag.Link, child key.Key) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should accept a *cid.Cid instead of []*mdag.Link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will look into it.

@@ -417,3 +417,7 @@ func (bs *Bitswap) GetWantlist() []key.Key {
}
return out
}

func (bs *Bitswap) IsOnline() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesnt look like we use this anywhere. Am i missing something or is this just dead code from a different approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used. I added a IsOnline() method to exchange to be able to tell if the exchange is online or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah. I missed it on my previous pass through

@whyrusleeping
Copy link
Member

Alright, I think this looks good to me. @Kubuxu wanna give it a good look over?

Reporter metrics.Reporter
Discovery discovery.Service
FilesRoot *mfs.Root
Peerstore pstore.Peerstore // storage for other Peer instances
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try an undo these changes? its just noise on the diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was just about to re-base to clean up this diff noise. 😄

@kevina kevina force-pushed the kevina/getlinks branch 2 times, most recently from 2697850 to f0e5138 Compare October 6, 2016 05:04
This method will use the (also new) LinkService if it is available to
retrieving just the links for a MerkleDAG without necessary having to
retrieve the underlying block.

For now the main benefit is that the pinner will not break when a block
becomes invalid due to a change in the backing file.  This is possible
because the metadata for a block (that includes the Links) is stored
separately and thus always available even if the backing file changes.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
kevina added 2 commits October 6, 2016 11:52
Instead make LinkService a part of DAGService.  The LinkService is now
simply an interface that DAGService implements.  Also provide a
GetOfflineLinkService() method that the GC uses to get an offline
instance.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
…g.Link

Author: Kevin Atkinson <[email protected]>

Fix EnumerateChildren & hasChild to take a *cid.Cid instead of []*mdag.Link

Author: Jeromy Johnson <[email protected]>

make FetchGraph use a cid

pin: fix TestPinRecursiveFail

License: MIT
Signed-off-by: Jeromy <[email protected]>

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
@kevina
Copy link
Contributor Author

kevina commented Oct 6, 2016

Just rebased on master and assuming the tests pass this should be good to go. @Kubuxu what do you think?

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@whyrusleeping whyrusleeping merged commit 2fd045f into master Oct 8, 2016
@whyrusleeping whyrusleeping deleted the kevina/getlinks branch October 8, 2016 18:45
@whyrusleeping whyrusleeping removed the status/in-progress In progress label Oct 8, 2016
@kevina kevina added this to the Filestore implementation milestone Oct 19, 2016
@ghost ghost mentioned this pull request Dec 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants