Add more aggressive rate limiting for publishing new crates #1681

sgrif · 2019-03-19T23:12:02Z

This is still incomplete, but the bulk of the code has been written so I figured I'd get some eyes on it. Right now this just panics instead of returning an error if the user is out of tokens. Still left to do are:

The two ignored test cases
Implementing the actual error type
Per-user burst rate overrides
cron job to restrict the table size and clean up stale buckets (I probably won't land this in the initial PR, our users table needs to grow by 2 orders of magnitude for this to really matter -- but I do want to land it as a followup PR since I haven't tested this with cases where now - last_update is greater than a month. It should work fine but I'd rather not have this run against poorly defined semantics)

I think the limit we'll probably set to start is 1 req/10s with a burst of 30. The error message will tell folks they can either wait for {time until next token} or email us to get the limit increased for them. This is limited per user instead of per ip since rotating your user is harder than rotating your IP. It's stored in the DB since this is only for publishing new crates, which is slow enough already that the DB load of rate limiting there shouldn't matter.

I needed to update to Rust 1.33 to get Duration::as_millis (note: the way we're using this feature causes UB if the rate limit is slower than 1 request per 292471208 years. I assume this is not a problem)
I needed to update to Diesel 1.4.2 to get a fix for diesel-rs/diesel#2017

The algorithm used is pretty much the standard token bucket algorithm. It's slightly different in how we set tokens = max(0, tokens - 1) + tokens_to_add instead of tokens = max(0, tokens_to_add + 1). This is because the usual implementation checks available tokens before subtracting them (and thus never persists if there aren't enough tokens available). Since we're doing this in a single query, and we can only return the final, persisted value, we have to change the calculation slightly to make sure that a user who is out of tokens gets 1 back after the rate limit.

A side effect of all of this is that our token count is actually offset by 1. 0 means the user is not only out of tokens, but that we just tried to take a token and couldn't. 1 means an empty bucket, and a full bucket would technically be burst + 1. The alternative would be -1 meaning the user is actually out of tokens, but since we only ever refill the bucket when we're trying to take a token, we never actually persist a full bucket. I figured a range of 0...burst made more sense than -1..burst.

bors · 2019-03-20T20:35:01Z

☔ The latest upstream changes (presumably #1682) made this pull request unmergeable. Please resolve the merge conflicts.

carols10cents · 2019-03-21T20:32:00Z

Rate limiting by user account rather than IP address will improve the case where someone is attempting to prove a point by grabbing lots of crates and rotates their IP address to evade the nginx blocking, so this is an improvement over what we currently have
I'm concerned this isn't aggressive enough... getting 1 token every 10 seconds still means a spammer can create 1,440 crates in 4 hours (the on-call primary response time for the latest incident, 6 crates per min * 60 min per hour * 4 hours if I've done my math correctly), which seems like a lot and way over any legitimate use of crates.io. Is there a reason you're not considering something like exponential backoff that would make it harder for a spammer to continue the more they try and stop incidents before they get to a disruptive point?

sgrif · 2019-03-21T21:36:02Z

Sorry, I meant 10 minutes not 10 seconds (Our current rate limit is 1req/min)

sgrif · 2019-03-21T22:19:02Z

Can you elaborate what you're imagining for exponential back-off for rate limiting?

carols10cents · 2019-03-22T00:44:46Z

Sorry, I meant 10 minutes not 10 seconds (Our current rate limit is 1req/min)

Aha, that's much better!

Can you elaborate what you're imagining for exponential back-off for rate limiting?

Conveniently, the git index updates are processed in a background queue now, so we could limit the number of pending new crate publish requests a particular user has, and we could add artificial delay to each additional new crate publish request that comes from that user. So we get 30 publish requests that we let go through fine, then the 31st gets a 1 second delay, then the 32nd gets a 2 second delay, 33rd=4sec, etc until it gets too slow for them to care to continue publishing crates and the queue could page someone.

jtgeibel · 2019-03-22T01:10:54Z

Conveniently, the git index updates are processed in a background queue now, so we could limit the number of pending new crate publish requests a particular user has, and we could add artificial delay to each additional new crate publish request that comes from that user.

Unfortunately this would allow an attacker to cause the database and index to diverge, for potentially large amounts of time. I'd rather avoid allowing that. I think it makes sense to reject the initial request as quickly as possible.

sgrif · 2019-04-04T20:58:41Z

This should be ready to go. I punted cleaning up stale buckets for now, as it's not critical to land immediately

carols10cents · 2019-04-07T02:04:33Z

tests are failing

sgrif · 2019-04-07T02:17:15Z

Ugh. It's a ns vs us precision issue...

sgrif · 2019-04-08T21:39:28Z

Tests should be green on linux now.

jtgeibel

Looks great overall. Mainly I think the default values should be adjusted.

src/publish_rate_limit.rs

src/tests/http-data/krate_user_never_gets_more_than_max_tokens_added

bors · 2019-05-22T23:54:01Z

☔ The latest upstream changes (presumably #1686) made this pull request unmergeable. Please resolve the merge conflicts.

I think the limit we'll probably set to start is 1 req/10s with a burst of 30. The error message will tell folks they can either wait for {time until next token} or email us to get the limit increased for them. This is limited per user instead of per ip since rotating your user is harder than rotating your IP. It's stored in the DB since this is only for publishing new crates, which is slow enough already that the DB load of rate limiting there shouldn't matter. I needed to update to Rust 1.33 to get `Duration::as_millis` (note: the way we're using this feature causes UB if the rate limit is slower than 1 request per 292471208 years. I assume this is not a problem) I needed to update to Diesel 1.4.2 to get a fix for diesel-rs/diesel#2017 The algorithm used is pretty much the standard token bucket algorithm. It's *slightly* different in how we set `tokens = max(0, tokens - 1) + tokens_to_add` instead of `tokens = max(0, tokens_to_add + 1)`. This is because the usual implementation checks available tokens before subtracting them (and thus never persists if there aren't enough tokens available). Since we're doing this in a single query, and we can *only* return the final, persisted value, we have to change the calculation slightly to make sure that a user who is out of tokens gets `1` back after the rate limit. A side effect of all of this is that our token count is actually offset by 1. 0 means the user is not only out of tokens, but that we just tried to take a token and couldn't. 1 means an empty bucket, and a full bucket would technically be burst + 1. The alternative would be -1 meaning the user is actually out of tokens, but since we only ever refill the bucket when we're trying to take a token, we never actually persist a full bucket. I figured a range of 0...burst made more sense than -1..burst.

sgrif · 2019-05-30T18:26:05Z

@bors r=jtgeibel

bors · 2019-05-30T18:26:09Z

📌 Commit 676efab has been approved by jtgeibel

bors · 2019-05-30T18:26:18Z

⌛ Testing commit 676efab with merge e71d82a...

Add more aggressive rate limiting for publishing new crates This is still incomplete, but the bulk of the code has been written so I figured I'd get some eyes on it. Right now this just panics instead of returning an error if the user is out of tokens. Still left to do are: - The two ignored test cases - Implementing the actual error type - Per-user burst rate overrides - cron job to restrict the table size and clean up stale buckets (I probably won't land this in the initial PR, our users table needs to grow by 2 orders of magnitude for this to really matter -- but I do want to land it as a followup PR since I haven't tested this with cases where now - last_update is greater than a month. It should work fine but I'd rather not have this run against poorly defined semantics) I think the limit we'll probably set to start is 1 req/10s with a burst of 30. The error message will tell folks they can either wait for {time until next token} or email us to get the limit increased for them. This is limited per user instead of per ip since rotating your user is harder than rotating your IP. It's stored in the DB since this is only for publishing new crates, which is slow enough already that the DB load of rate limiting there shouldn't matter. I needed to update to Rust 1.33 to get `Duration::as_millis` (note: the way we're using this feature causes UB if the rate limit is slower than 1 request per 292471208 years. I assume this is not a problem) I needed to update to Diesel 1.4.2 to get a fix for diesel-rs/diesel#2017 The algorithm used is pretty much the standard token bucket algorithm. It's *slightly* different in how we set `tokens = max(0, tokens - 1) + tokens_to_add` instead of `tokens = max(0, tokens_to_add + 1)`. This is because the usual implementation checks available tokens before subtracting them (and thus never persists if there aren't enough tokens available). Since we're doing this in a single query, and we can *only* return the final, persisted value, we have to change the calculation slightly to make sure that a user who is out of tokens gets `1` back after the rate limit. A side effect of all of this is that our token count is actually offset by 1. 0 means the user is not only out of tokens, but that we just tried to take a token and couldn't. 1 means an empty bucket, and a full bucket would technically be burst + 1. The alternative would be -1 meaning the user is actually out of tokens, but since we only ever refill the bucket when we're trying to take a token, we never actually persist a full bucket. I figured a range of 0...burst made more sense than -1..burst.

sgrif · 2019-05-30T18:26:53Z

@bors cancel

sgrif · 2019-05-30T18:27:12Z

@bors r=jtgeibel

bors · 2019-05-30T18:27:17Z

📌 Commit f597a3f has been approved by jtgeibel

Add more aggressive rate limiting for publishing new crates This is still incomplete, but the bulk of the code has been written so I figured I'd get some eyes on it. Right now this just panics instead of returning an error if the user is out of tokens. Still left to do are: - The two ignored test cases - Implementing the actual error type - Per-user burst rate overrides - cron job to restrict the table size and clean up stale buckets (I probably won't land this in the initial PR, our users table needs to grow by 2 orders of magnitude for this to really matter -- but I do want to land it as a followup PR since I haven't tested this with cases where now - last_update is greater than a month. It should work fine but I'd rather not have this run against poorly defined semantics) I think the limit we'll probably set to start is 1 req/10s with a burst of 30. The error message will tell folks they can either wait for {time until next token} or email us to get the limit increased for them. This is limited per user instead of per ip since rotating your user is harder than rotating your IP. It's stored in the DB since this is only for publishing new crates, which is slow enough already that the DB load of rate limiting there shouldn't matter. I needed to update to Rust 1.33 to get `Duration::as_millis` (note: the way we're using this feature causes UB if the rate limit is slower than 1 request per 292471208 years. I assume this is not a problem) I needed to update to Diesel 1.4.2 to get a fix for diesel-rs/diesel#2017 The algorithm used is pretty much the standard token bucket algorithm. It's *slightly* different in how we set `tokens = max(0, tokens - 1) + tokens_to_add` instead of `tokens = max(0, tokens_to_add + 1)`. This is because the usual implementation checks available tokens before subtracting them (and thus never persists if there aren't enough tokens available). Since we're doing this in a single query, and we can *only* return the final, persisted value, we have to change the calculation slightly to make sure that a user who is out of tokens gets `1` back after the rate limit. A side effect of all of this is that our token count is actually offset by 1. 0 means the user is not only out of tokens, but that we just tried to take a token and couldn't. 1 means an empty bucket, and a full bucket would technically be burst + 1. The alternative would be -1 meaning the user is actually out of tokens, but since we only ever refill the bucket when we're trying to take a token, we never actually persist a full bucket. I figured a range of 0...burst made more sense than -1..burst.

bors · 2019-05-30T18:27:29Z

⌛ Testing commit f597a3f with merge a6de1e1...

bors · 2019-05-30T19:00:45Z

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing a6de1e1 to master...

sgrif requested a review from jtgeibel March 19, 2019 23:12

sgrif mentioned this pull request Mar 19, 2019

WIP: Rate limit the publish crate endpoint #1676

Closed

4 tasks

sgrif force-pushed the sg-rate-limit-publish branch from 4cbd958 to 1f1462d Compare April 1, 2019 19:47

sgrif marked this pull request as ready for review April 4, 2019 20:58

carols10cents added the S-waiting-on-author label Apr 7, 2019

sgrif added S-waiting-on-review and removed S-waiting-on-author labels Apr 8, 2019

sgrif force-pushed the sg-rate-limit-publish branch from ed92388 to 11782dd Compare May 13, 2019 16:59

jtgeibel requested changes May 22, 2019

View reviewed changes

src/publish_rate_limit.rs Outdated Show resolved Hide resolved

src/tests/http-data/krate_user_never_gets_more_than_max_tokens_added Outdated Show resolved Hide resolved

sgrif force-pushed the sg-rate-limit-publish branch from 11782dd to 676efab Compare May 30, 2019 18:25

Remove unused file

f597a3f

bors merged commit f597a3f into rust-lang:master May 30, 2019

sgrif deleted the sg-rate-limit-publish branch August 14, 2019 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more aggressive rate limiting for publishing new crates #1681

Add more aggressive rate limiting for publishing new crates #1681

sgrif commented Mar 19, 2019

bors commented Mar 20, 2019

carols10cents commented Mar 21, 2019

sgrif commented Mar 21, 2019 •

edited

Loading

sgrif commented Mar 21, 2019

carols10cents commented Mar 22, 2019

jtgeibel commented Mar 22, 2019

sgrif commented Apr 4, 2019

carols10cents commented Apr 7, 2019

sgrif commented Apr 7, 2019

sgrif commented Apr 8, 2019

jtgeibel left a comment

bors commented May 22, 2019

sgrif commented May 30, 2019

bors commented May 30, 2019

bors commented May 30, 2019

sgrif commented May 30, 2019

sgrif commented May 30, 2019

bors commented May 30, 2019

bors commented May 30, 2019

bors commented May 30, 2019

Add more aggressive rate limiting for publishing new crates #1681

Add more aggressive rate limiting for publishing new crates #1681

Conversation

sgrif commented Mar 19, 2019

bors commented Mar 20, 2019

carols10cents commented Mar 21, 2019

sgrif commented Mar 21, 2019 • edited Loading

sgrif commented Mar 21, 2019

carols10cents commented Mar 22, 2019

jtgeibel commented Mar 22, 2019

sgrif commented Apr 4, 2019

carols10cents commented Apr 7, 2019

sgrif commented Apr 7, 2019

sgrif commented Apr 8, 2019

jtgeibel left a comment

Choose a reason for hiding this comment

bors commented May 22, 2019

sgrif commented May 30, 2019

bors commented May 30, 2019

bors commented May 30, 2019

sgrif commented May 30, 2019

sgrif commented May 30, 2019

bors commented May 30, 2019

bors commented May 30, 2019

bors commented May 30, 2019

sgrif commented Mar 21, 2019 •

edited

Loading