Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic crawler #663

Merged
merged 4 commits into from
Apr 22, 2021
Merged

Add basic crawler #663

merged 4 commits into from
Apr 22, 2021

Conversation

aschmahmann
Copy link
Contributor

For use by #574

@aschmahmann aschmahmann changed the base branch from master to refactor/extract-messages June 3, 2020 23:00
@aschmahmann aschmahmann marked this pull request as ready for review June 4, 2020 06:27
}()
}

defer wg.Done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the go routines above don't decrement wg when they finish, so this won't work as expected

logger.Debugf("peer %v had %d peers", res.peer, len(res.data))
rtPeers := make([]*peer.AddrInfo, 0, len(res.data))
for p, ai := range res.data {
c.host.Peerstore().AddAddrs(p, ai.Addrs, time.Hour)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hour seems potentially racy. could imagine this lasting longer than that.

handleSuccess(res.peer, rtPeers)
}
} else if handleFail != nil {
handleFail(res.peer, res.err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more scaffolding categorizing errors would be useful. how many attempted connections are timing out? how many are failing to connect?

}
for _, ai := range peers {
if _, ok := localPeers[ai.ID]; !ok {
localPeers[ai.ID] = ai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many peers have a valid dial-able address in their address info?

crawler/crawler.go Outdated Show resolved Hide resolved
@willscott
Copy link
Contributor

@aschmahmann will this get merged? would be nice to be able to depend on at least master rather than your branch if working with this

@aschmahmann
Copy link
Contributor Author

aschmahmann commented Jul 5, 2020

I'd like to merge this, but last I looked at it I was still having difficulties with some peers being marked "unreachable" when they should have been otherwise reachable (perhaps due to doing too many simultaneous dials, or something else). Once that's resolved I'd happily merge it.

If you'd prefer to start working off it in its current state I could probably merge it and just make sure to add some BIG DISCLAIMERS on the crawler that it's a WIP (e.g. I don't want people thinking that they can get totally accurate DHT metrics out of libp2p DHTs using this until I'm more confident in the results)

@aschmahmann aschmahmann force-pushed the refactor/extract-messages branch 3 times, most recently from b0dc23d to 197ecae Compare October 7, 2020 03:39
@aschmahmann aschmahmann force-pushed the refactor/extract-messages branch from fcf7104 to 138cb80 Compare January 4, 2021 06:11
@willscott
Copy link
Contributor

What needs to happen for this branch to get merged into current work? 6 months is probably a good indication that waiting isn't going to fix the confidence issues directly, and we probably do better by having it incorporated into upstream rather than ongoing rebasing

@aschmahmann
Copy link
Contributor Author

I'd probably be happy if we had a script running this code and pumping out information we could compare with existing crawlers and left it running for a week or two.

I just rebased #659, so if you've got time to re-review that one I think we can just merge it today and then move onto the next ones (including rebasing this PR).

@willscott
Copy link
Contributor

👍 left comments. a couple minor things on structuring errors i'd like to see in that one, but i'm happy for that to land basically as is.

@aschmahmann aschmahmann changed the base branch from refactor/extract-messages to master January 4, 2021 20:06
crawler/options.go Outdated Show resolved Hide resolved
@willscott
Copy link
Contributor

@aschmahmann - I pushed an addition to this branch to configure the peer connection timeout duration as an option.

@willscott
Copy link
Contributor

Upon running this code over the last week, it has largely agreed with our other DHT metrics, e.g. https://dht.ecosystem-dashboard.com/

As such, I would be happy to see this merged at this point, @aschmahmann

willscott and others added 2 commits January 27, 2021 12:30
…to Info. Stop returning partial peersets if a peer cannot give us their full routing table. Keep starting peer addresses alive in the peerstore.
@aschmahmann aschmahmann merged commit 7159892 into master Apr 22, 2021
@aschmahmann aschmahmann deleted the feat/crawler branch April 22, 2021 21:58
@aschmahmann aschmahmann restored the feat/crawler branch April 22, 2021 21:58
@aschmahmann aschmahmann mentioned this pull request May 14, 2021
71 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants