-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize atx sync codepath #4977
Comments
i will solve by using in-memory cache with all atxs #5013 |
i will try to disable atx sync in next epoch and use only atx regosipping |
|
This was referenced Oct 6, 2023
bors bot
pushed a commit
that referenced
this issue
Oct 13, 2023
closes: #5127 #5036 peers that are overwhelmed or generally will not be used for requests. there are two criteria used to select good peer: - request success rate . success rates within 0.1 (10%) of each other are treated as equal, and in such case we will use latency - latency. hs/1 protocol used to track latency, as it is the most used protocol and objects served in this protocol are of the same size with several exceptions (active sets, list of malfeasence proofs). related: #4977 limits number of peers to request data for atxs. previously we were requesting data from all peers atleast once. synced data 2 times in 90m, previous attempt on my computer was 1 week ago and took 12h
bors bot
pushed a commit
that referenced
this issue
Oct 13, 2023
closes: #5127 #5036 peers that are overwhelmed or generally will not be used for requests. there are two criteria used to select good peer: - request success rate . success rates within 0.1 (10%) of each other are treated as equal, and in such case we will use latency - latency. hs/1 protocol used to track latency, as it is the most used protocol and objects served in this protocol are of the same size with several exceptions (active sets, list of malfeasence proofs). related: #4977 limits number of peers to request data for atxs. previously we were requesting data from all peers atleast once. synced data 2 times in 90m, previous attempt on my computer was 1 week ago and took 12h
dshulyak
added a commit
to dshulyak/go-spacemesh
that referenced
this issue
Oct 13, 2023
…emeshos#5143) closes: spacemeshos#5127 spacemeshos#5036 peers that are overwhelmed or generally will not be used for requests. there are two criteria used to select good peer: - request success rate . success rates within 0.1 (10%) of each other are treated as equal, and in such case we will use latency - latency. hs/1 protocol used to track latency, as it is the most used protocol and objects served in this protocol are of the same size with several exceptions (active sets, list of malfeasence proofs). related: spacemeshos#4977 limits number of peers to request data for atxs. previously we were requesting data from all peers atleast once. synced data 2 times in 90m, previous attempt on my computer was 1 week ago and took 12h
3 tasks
bors bot
pushed a commit
that referenced
this issue
Oct 20, 2023
closes: #4977 closes: #4603 this change introduces two configuration parameter for every server: - requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use - queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue example configuration: ```json "fetch": { "servers": { "ax/1": {"queue": 10, "requests": 1, "interval": "1s"}, "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"}, "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "ml/1": {"queue": 100, "requests": 10, "interval": "1s"}, "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"} } } ``` https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144 metrics are per server: https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52 have to be enabled for all servers with ```json "fetch": { "servers-metrics": true } ```
bors bot
pushed a commit
that referenced
this issue
Oct 20, 2023
closes: #4977 closes: #4603 this change introduces two configuration parameter for every server: - requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use - queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue example configuration: ```json "fetch": { "servers": { "ax/1": {"queue": 10, "requests": 1, "interval": "1s"}, "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"}, "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "ml/1": {"queue": 100, "requests": 10, "interval": "1s"}, "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"} } } ``` https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144 metrics are per server: https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52 have to be enabled for all servers with ```json "fetch": { "servers-metrics": true } ```
bors bot
pushed a commit
that referenced
this issue
Oct 21, 2023
closes: #4977 closes: #4603 this change introduces two configuration parameter for every server: - requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use - queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue example configuration: ```json "fetch": { "servers": { "ax/1": {"queue": 10, "requests": 1, "interval": "1s"}, "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"}, "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "ml/1": {"queue": 100, "requests": 10, "interval": "1s"}, "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"} } } ``` https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144 metrics are per server: https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52 have to be enabled for all servers with ```json "fetch": { "servers-metrics": true } ```
bors bot
pushed a commit
that referenced
this issue
Oct 22, 2023
closes: #4977 closes: #4603 this change introduces two configuration parameter for every server: - requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use - queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue example configuration: ```json "fetch": { "servers": { "ax/1": {"queue": 10, "requests": 1, "interval": "1s"}, "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"}, "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "ml/1": {"queue": 100, "requests": 10, "interval": "1s"}, "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"} } } ``` https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144 metrics are per server: https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52 have to be enabled for all servers with ```json "fetch": { "servers-metrics": true } ```
bors bot
pushed a commit
that referenced
this issue
Oct 22, 2023
closes: #4977 closes: #4603 this change introduces two configuration parameter for every server: - requests per interval pace, for example 10 req/s, this caps the maximum bandwidth that every server can use - queue size, it is set to serve requests within expected latency. every other request is dropped immediately so that client can retry with different node. currently the timeout is set to 10s, so the queue should be roughly 10 times larger then rps it doesn't provide global limit for bandwidth, but we have limit for the number of peers. and honest peer doesn't run many concurrent queries. so what we really want to handle is peers with intentionally malicious behavior, but thats not a pressing issue example configuration: ```json "fetch": { "servers": { "ax/1": {"queue": 10, "requests": 1, "interval": "1s"}, "ld/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "hs/1": {"queue": 2000, "requests": 200, "interval": "1s"}, "mh/1": {"queue": 1000, "requests": 100, "interval": "1s"}, "ml/1": {"queue": 100, "requests": 10, "interval": "1s"}, "lp/2": {"queue": 10000, "requests": 1000, "interval": "1s"} } } ``` https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/fetch/fetch.go#L130-L144 metrics are per server: https://github.com/spacemeshos/go-spacemesh/blob/3cf02146bf27f53c001bffcacffbda05933c27c4/p2p/server/metrics.go#L15-L52 have to be enabled for all servers with ```json "fetch": { "servers-metrics": true } ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
it requests and downloads all activation ids that are known to the node.
there main problem is that it scans database on every request, which should be mitigated by implementing smarter caching #4164 . such caching should keep all activations for the epoch in cache, and not evict them on lru strategy.
secondary problem is amount of traffic that it adds, this is less straightforward to solve. maybe we should consider to drop atx sync and instead everyone should regossip its own atx every 30m.
ideally we should implement this before atx sync starts in next epoch.
The text was updated successfully, but these errors were encountered: