-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic crawler #663
Add basic crawler #663
Conversation
crawler/crawler.go
Outdated
}() | ||
} | ||
|
||
defer wg.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the go routines above don't decrement wg when they finish, so this won't work as expected
logger.Debugf("peer %v had %d peers", res.peer, len(res.data)) | ||
rtPeers := make([]*peer.AddrInfo, 0, len(res.data)) | ||
for p, ai := range res.data { | ||
c.host.Peerstore().AddAddrs(p, ai.Addrs, time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hour seems potentially racy. could imagine this lasting longer than that.
handleSuccess(res.peer, rtPeers) | ||
} | ||
} else if handleFail != nil { | ||
handleFail(res.peer, res.err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit more scaffolding categorizing errors would be useful. how many attempted connections are timing out? how many are failing to connect?
} | ||
for _, ai := range peers { | ||
if _, ok := localPeers[ai.ID]; !ok { | ||
localPeers[ai.ID] = ai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many peers have a valid dial-able address in their address info?
@aschmahmann will this get merged? would be nice to be able to depend on at least master rather than your branch if working with this |
I'd like to merge this, but last I looked at it I was still having difficulties with some peers being marked "unreachable" when they should have been otherwise reachable (perhaps due to doing too many simultaneous dials, or something else). Once that's resolved I'd happily merge it. If you'd prefer to start working off it in its current state I could probably merge it and just make sure to add some BIG DISCLAIMERS on the crawler that it's a WIP (e.g. I don't want people thinking that they can get totally accurate DHT metrics out of libp2p DHTs using this until I'm more confident in the results) |
b0dc23d
to
197ecae
Compare
ce90d5c
to
ff267da
Compare
fcf7104
to
138cb80
Compare
What needs to happen for this branch to get merged into current work? 6 months is probably a good indication that waiting isn't going to fix the confidence issues directly, and we probably do better by having it incorporated into upstream rather than ongoing rebasing |
I'd probably be happy if we had a script running this code and pumping out information we could compare with existing crawlers and left it running for a week or two. I just rebased #659, so if you've got time to re-review that one I think we can just merge it today and then move onto the next ones (including rebasing this PR). |
👍 left comments. a couple minor things on structuring errors i'd like to see in that one, but i'm happy for that to land basically as is. |
8b54cba
to
a2d41e5
Compare
@aschmahmann - I pushed an addition to this branch to configure the peer connection timeout duration as an option. |
Upon running this code over the last week, it has largely agreed with our other DHT metrics, e.g. https://dht.ecosystem-dashboard.com/ As such, I would be happy to see this merged at this point, @aschmahmann |
…to Info. Stop returning partial peersets if a peer cannot give us their full routing table. Keep starting peer addresses alive in the peerstore.
For use by #574