-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pii): Do not panic on large regex [INC-202] #1474
Conversation
Ok(Some(config)) => RelayStr::from_string(config.to_json()?), | ||
Ok(None) => RelayStr::new("{}"), | ||
// NOTE: Callers of this function must be able to handle this error. | ||
Err(e) => RelayStr::from_string(e.to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentry side does not handle this case yet.
@@ -140,7 +140,7 @@ fn bench_pii_stripping(c: &mut Criterion) { | |||
|b, datascrubbing_config| b.iter(|| datascrubbing_config.pii_config_uncached()), | |||
); | |||
|
|||
let pii_config = datascrubbing_config.pii_config_uncached().unwrap(); | |||
let pii_config = datascrubbing_config.pii_config_uncached().unwrap().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a benchmark, so unwrap should be fine.
relay-general/src/pii/config.rs
Outdated
impl Pattern { | ||
pub fn new(s: &str, case_insensitive: bool) -> Result<Self, PiiConfigError> { | ||
let regex = RegexBuilder::new(s) | ||
.size_limit(262_144) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply the same size limit to every Pattern
that we create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why that size limit specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: should we make the number a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it from [here, but it's a valid question. It's 2^18, and it seems we already copied it from somewhere else once: https://github.com//pull/505/files#r388342341
@untitaker any recollection where this value comes from? My guess is it was arbitrary from the start. We can definitely make it a constant though.
RegexBuilder::new(&key_pattern) | ||
.case_insensitive(true) | ||
.build() | ||
.unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the line that panicked in production.
relay-general/src/pii/config.rs
Outdated
impl Pattern { | ||
pub fn new(s: &str, case_insensitive: bool) -> Result<Self, PiiConfigError> { | ||
let regex = RegexBuilder::new(s) | ||
.size_limit(262_144) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why that size limit specifically?
relay-general/src/pii/config.rs
Outdated
impl Pattern { | ||
pub fn new(s: &str, case_insensitive: bool) -> Result<Self, PiiConfigError> { | ||
let regex = RegexBuilder::new(s) | ||
.size_limit(262_144) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: should we make the number a constant?
let pii_config = config | ||
.datascrubbing_settings | ||
.pii_config() | ||
.map_err(|e| ProcessingError::PiiConfigError(e.clone()))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will drop the event, which we should assign a dedicated discard reason to (or use the one we also use for broken project configs). Can we additionally mark the project config as invalid somehow so we reject subsequent events earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got me thinking, the entire concept of holding a cached Result
inside the config, and then failing every time an event needs it, seems wrong.
relay/relay-general/src/pii/legacy.rs
Line 41 in 8dbbc85
pub(super) pii_config: OnceCell<Result<Option<PiiConfig>, PiiConfigError>>, |
I simplified the code to convert the config on serialization, thereby making this particular error a deserialization error. Because it's a bigger refactor, I put that change into its own sub-PR: #1478
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closed the linked PR (see discussion there) and added a separate discard reason for pii conversion failure.
I would not put extra effort into invalidating the project config when this conversion fails, since having invalid data scrubbing settings is a clear bug that should be fixed on the sentry side, rather than an acceptable day-to-day case.
Co-authored-by: Jan Michael Auer <[email protected]> Co-authored-by: Iker Barriocanal <[email protected]>
INC-202 showed that allowing the user to configure arbitrary sensitive fields will exceed the maximum limit on the compiled regex in Relay. This PR limits sensitiveFields to 4000 characters. See also getsentry/relay#1474.
@@ -37,6 +37,10 @@ pub struct DataScrubbingConfig { | |||
/// PII config derived from datascrubbing settings. | |||
/// | |||
/// Cached because the conversion process is expensive. | |||
/// | |||
/// NOTE: We discarded the idea of making the conversion to PiiConfig part of deserialization, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
use once_cell::sync::OnceCell; | ||
use regex::{Regex, RegexBuilder}; | ||
use serde::{de::Error, Deserialize, Deserializer, Serialize, Serializer}; | ||
|
||
use crate::pii::{CompiledPiiConfig, Redaction}; | ||
use crate::processor::SelectorSpec; | ||
|
||
const COMPILED_PATTERN_MAX_SIZE: usize = 262_144; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this max size was determined by trial and error so that we are able to parse builtin rules at all. I think it was either the creditcard or ip addr regex that required increasing the limit
#1474 introduced an explicit size limit on the regex used to convert legacy datascrubbing settings to PII config. We are seeing new outcomes with reason project_state_pii in our statsd metrics now, so we should log an error to Sentry (tagged by project_id) to see which projects are affected.
This is a partial revert of #1474. Instead of applying a stricter size limit to regexes built from datascrubbing settings, which led to data loss for at least one project, only keep the part that prevents a panic on CompiledSizeTooBig.
In INC-202 we saw that project configs with large entries for
sensitiveFields
are able to crash worker threads, because we compile the list into a string with|
and did not handle theCompiledTooBig
error returned by the regex crate.This PR prevents the panic by making
to_pii_config
fallible. The resulting failure mode is that callingscrub_data
on events will fail, and the event is dropped with outcome "internal".See also getsentry/sentry#38803.