Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pii): Do not panic on large regex [INC-202] #1474

Merged
merged 10 commits into from
Sep 16, 2022
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
- Process required stacktraces to fix filtering events originating from browser extensions. ([#1423](https://github.com/getsentry/relay/pull/1423))
- Fix error message filtering when formatting the message of logentry. ([#1442](https://github.com/getsentry/relay/pull/1442))
- Loosen type requirements for the `user.id` field in Replays. ([#1443](https://github.com/getsentry/relay/pull/1443))
- Use the different configuration for billing outcomes when specified. ([#1461](https://github.com/getsentry/relay/pull/1461))
jjbayer marked this conversation as resolved.
Show resolved Hide resolved
- Fix panic in datascrubbing when number of sensitive fields was too large. ([#1474](https://github.com/getsentry/relay/pull/1474))


**Internal**:

Expand Down
1 change: 1 addition & 0 deletions py/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

- Add `transaction_info` to event payloads, including the transaction's source and internal original transaction name. ([#1330](https://github.com/getsentry/relay/pull/1330))
- Add user-agent parsing to replays processor. ([#1420](https://github.com/getsentry/relay/pull/1420))
- convert_datascrubbing_config will now return an error string when conversion fails. ([#1474](https://github.com/getsentry/relay/pull/1474))
jjbayer marked this conversation as resolved.
Show resolved Hide resolved

## 0.8.13

Expand Down
6 changes: 4 additions & 2 deletions relay-cabi/src/processing.rs
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,10 @@ pub unsafe extern "C" fn relay_validate_pii_config(value: *const RelayStr) -> Re
pub unsafe extern "C" fn relay_convert_datascrubbing_config(config: *const RelayStr) -> RelayStr {
let config: DataScrubbingConfig = serde_json::from_str((*config).as_str())?;
match config.pii_config() {
Some(config) => RelayStr::from_string(config.to_json()?),
None => RelayStr::new("{}"),
Ok(Some(config)) => RelayStr::from_string(config.to_json()?),
Ok(None) => RelayStr::new("{}"),
// NOTE: Callers of this function must be able to handle this error.
Err(e) => RelayStr::from_string(e.to_string()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentry side does not handle this case yet.

}
}

Expand Down
2 changes: 1 addition & 1 deletion relay-general/benches/benchmarks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ fn bench_pii_stripping(c: &mut Criterion) {
|b, datascrubbing_config| b.iter(|| datascrubbing_config.pii_config_uncached()),
);

let pii_config = datascrubbing_config.pii_config_uncached().unwrap();
let pii_config = datascrubbing_config.pii_config_uncached().unwrap().unwrap();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a benchmark, so unwrap should be fine.


group.bench_with_input(
BenchmarkId::new("compile_pii_config", config_name),
Expand Down
32 changes: 26 additions & 6 deletions relay-general/src/pii/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,33 @@ use serde::{de::Error, Deserialize, Deserializer, Serialize, Serializer};

use crate::pii::{CompiledPiiConfig, Redaction};
use crate::processor::SelectorSpec;
use failure::Fail;
jjbayer marked this conversation as resolved.
Show resolved Hide resolved

#[derive(Clone, Debug, Fail)]
pub enum PiiConfigError {
#[fail(display = "could not parse regex")]
RegexError(#[cause] regex::Error),
}

/// A regex pattern for text replacement.
#[derive(Clone)]
pub struct Pattern(pub Regex);
pub struct Pattern(Regex);

impl Pattern {
pub fn new(s: &str, case_insensitive: bool) -> Result<Self, PiiConfigError> {
jjbayer marked this conversation as resolved.
Show resolved Hide resolved
let regex = RegexBuilder::new(s)
.size_limit(262_144)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply the same size limit to every Pattern that we create.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that size limit specifically?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: should we make the number a constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it from [here, but it's a valid question. It's 2^18, and it seems we already copied it from somewhere else once: https://github.com//pull/505/files#r388342341

@untitaker any recollection where this value comes from? My guess is it was arbitrary from the start. We can definitely make it a constant though.

.case_insensitive(case_insensitive)
.build()
.map_err(PiiConfigError::RegexError)?;

Ok(Self(regex))
}

pub fn regex(&self) -> &Regex {
&self.0
}
}

impl Deref for Pattern {
type Target = Regex;
Expand Down Expand Up @@ -42,11 +65,8 @@ impl Serialize for Pattern {
impl<'de> Deserialize<'de> for Pattern {
fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
let raw = String::deserialize(deserializer)?;
let pattern = RegexBuilder::new(&raw)
.size_limit(262_144)
.build()
.map_err(Error::custom)?;
Ok(Pattern(pattern))
let pattern = Pattern::new(&raw, false).map_err(Error::custom)?;
Ok(pattern)
jjbayer marked this conversation as resolved.
Show resolved Hide resolved
}
}

Expand Down
35 changes: 23 additions & 12 deletions relay-general/src/pii/convert.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
use std::collections::BTreeMap;

use once_cell::sync::Lazy;
use regex::RegexBuilder;

use crate::pii::{
DataScrubbingConfig, Pattern, PiiConfig, RedactPairRule, Redaction, RuleSpec, RuleType, Vars,
};
use crate::processor::{SelectorPathItem, SelectorSpec, ValueType};

use crate::pii::config::PiiConfigError;
jjbayer marked this conversation as resolved.
Show resolved Hide resolved

// XXX: Move to @ip rule for better IP address scrubbing. Right now we just try to keep
// compatibility with Python.
static KNOWN_IP_FIELDS: Lazy<SelectorSpec> = Lazy::new(|| {
Expand All @@ -26,7 +27,9 @@ static DATASCRUBBER_IGNORE: Lazy<SelectorSpec> = Lazy::new(|| {
.unwrap()
});

pub fn to_pii_config(datascrubbing_config: &DataScrubbingConfig) -> Option<PiiConfig> {
pub fn to_pii_config(
datascrubbing_config: &DataScrubbingConfig,
) -> Result<Option<PiiConfig>, PiiConfigError> {
let mut custom_rules = BTreeMap::new();
let mut applied_rules = Vec::new();
let mut applications = BTreeMap::new();
Expand Down Expand Up @@ -70,12 +73,7 @@ pub fn to_pii_config(datascrubbing_config: &DataScrubbingConfig) -> Option<PiiCo
"strip-fields".to_owned(),
RuleSpec {
ty: RuleType::RedactPair(RedactPairRule {
key_pattern: Pattern(
RegexBuilder::new(&key_pattern)
.case_insensitive(true)
.build()
.unwrap(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the line that panicked in production.

),
key_pattern: Pattern::new(&key_pattern, true)?,
}),
redaction: Redaction::Replace("[Filtered]".to_owned().into()),
},
Expand All @@ -86,7 +84,7 @@ pub fn to_pii_config(datascrubbing_config: &DataScrubbingConfig) -> Option<PiiCo
}

if applied_rules.is_empty() && applications.is_empty() {
return None;
return Ok(None);
}

let mut conjunctions = vec![
Expand Down Expand Up @@ -116,12 +114,12 @@ pub fn to_pii_config(datascrubbing_config: &DataScrubbingConfig) -> Option<PiiCo
applications.insert(applied_selector, applied_rules);
}

Some(PiiConfig {
Ok(Some(PiiConfig {
rules: custom_rules,
vars: Vars::default(),
applications,
..Default::default()
})
}))
}

#[cfg(test)]
Expand All @@ -140,7 +138,7 @@ mod tests {
use super::to_pii_config as to_pii_config_impl;

fn to_pii_config(datascrubbing_config: &DataScrubbingConfig) -> Option<PiiConfig> {
let rv = to_pii_config_impl(datascrubbing_config);
let rv = to_pii_config_impl(datascrubbing_config).unwrap();
if let Some(ref config) = rv {
let roundtrip: PiiConfig =
serde_json::from_value(serde_json::to_value(config).unwrap()).unwrap();
Expand Down Expand Up @@ -284,6 +282,19 @@ THd+9FBxiHLGXNKhG/FRSyREXEt+NyYIf/0cyByc9tNksat794ddUqnLOg0vwSkv
"###);
}

#[test]
fn test_convert_sensitive_fields_too_large() {
let result = to_pii_config_impl(&DataScrubbingConfig {
sensitive_fields: vec!["1"]
.repeat(4085) // lowest number that will fail
.into_iter()
.map(|x| x.to_string())
.collect(),
..simple_enabled_config()
});
assert!(result.is_err());
}

#[test]
fn test_convert_exclude_field() {
let pii_config = to_pii_config(&DataScrubbingConfig {
Expand Down
10 changes: 6 additions & 4 deletions relay-general/src/pii/legacy.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ use serde::{Deserialize, Serialize};

use crate::pii::{convert, PiiConfig};

use super::config::PiiConfigError;

/// Helper method to check whether a flag is false.
#[allow(clippy::trivially_copy_pass_by_ref)]
fn is_flag_default(flag: &bool) -> bool {
Expand Down Expand Up @@ -36,7 +38,7 @@ pub struct DataScrubbingConfig {
///
/// Cached because the conversion process is expensive.
#[serde(skip)]
pub(super) pii_config: OnceCell<Option<PiiConfig>>,
pub(super) pii_config: OnceCell<Result<Option<PiiConfig>, PiiConfigError>>,
}

impl DataScrubbingConfig {
Expand All @@ -48,7 +50,7 @@ impl DataScrubbingConfig {
scrub_ip_addresses: false,
sensitive_fields: vec![],
scrub_defaults: false,
pii_config: OnceCell::with_value(None),
pii_config: OnceCell::with_value(Ok(None)),
}
}

Expand All @@ -61,15 +63,15 @@ impl DataScrubbingConfig {
///
/// This can be computationally expensive when called for the first time. The result is cached
/// internally and reused on the second call.
pub fn pii_config(&self) -> Option<&'_ PiiConfig> {
pub fn pii_config(&self) -> Result<&'_ Option<PiiConfig>, &PiiConfigError> {
self.pii_config
.get_or_init(|| self.pii_config_uncached())
.as_ref()
}

/// Like [`pii_config`](Self::pii_config) but without internal caching.
#[inline]
pub fn pii_config_uncached(&self) -> Option<PiiConfig> {
pub fn pii_config_uncached(&self) -> Result<Option<PiiConfig>, PiiConfigError> {
convert::to_pii_config(self)
}
}
4 changes: 2 additions & 2 deletions relay-general/src/pii/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ mod utils;
pub use self::attachments::{PiiAttachmentsProcessor, ScrubEncodings};
pub use self::compiledconfig::CompiledPiiConfig;
pub use self::config::{
AliasRule, MultipleRule, Pattern, PatternRule, PiiConfig, RedactPairRule, RuleSpec, RuleType,
Vars,
AliasRule, MultipleRule, Pattern, PatternRule, PiiConfig, PiiConfigError, RedactPairRule,
RuleSpec, RuleType, Vars,
};
pub use self::generate_selectors::selector_suggestions_from_value;
pub use self::legacy::DataScrubbingConfig;
Expand Down
4 changes: 2 additions & 2 deletions relay-general/src/pii/regexes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ pub fn get_regex_for_rule_type(
match ty {
RuleType::RedactPair(ref redact_pair) => smallvec![(
kv,
&redact_pair.key_pattern.0,
redact_pair.key_pattern.regex(),
ReplaceBehavior::replace_value()
)],
RuleType::Password => {
Expand All @@ -69,7 +69,7 @@ pub fn get_regex_for_rule_type(
None => ReplaceBehavior::replace_match(),
};

smallvec![(v, &r.pattern.0, replace_behavior)]
smallvec![(v, r.pattern.regex(), replace_behavior)]
}

RuleType::Imei => smallvec![(v, &*IMEI_REGEX, ReplaceBehavior::replace_match())],
Expand Down
15 changes: 11 additions & 4 deletions relay-server/src/actors/processor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,11 @@ use failure::Fail;
use flate2::write::{GzEncoder, ZlibEncoder};
use flate2::Compression;
use once_cell::sync::OnceCell;
use serde_json::Value as SerdeValue;

use relay_auth::RelayVersion;
use relay_common::{clone, ProjectId, ProjectKey, UnixTimestamp};
use relay_config::{Config, HttpEncoding};
use relay_filter::FilterStatKey;
use relay_general::pii::PiiConfigError;
use relay_general::pii::{PiiAttachmentsProcessor, PiiProcessor};
use relay_general::processor::{process_value, ProcessingState};
use relay_general::protocol::{
Expand All @@ -34,6 +33,7 @@ use relay_quotas::{DataCategory, RateLimits, ReasonCode};
use relay_redis::RedisPool;
use relay_sampling::{DynamicSamplingContext, RuleId};
use relay_statsd::metric;
use serde_json::Value as SerdeValue;

use crate::actors::envelopes::{EnvelopeManager, SendEnvelope, SendEnvelopeError, SubmitEnvelope};
use crate::actors::outcome::{DiscardReason, Outcome, TrackOutcome};
Expand Down Expand Up @@ -113,6 +113,9 @@ pub enum ProcessingError {

#[fail(display = "event dropped by sampling rule {}", _0)]
Sampled(RuleId),

#[fail(display = "invalid pii config")]
PiiConfigError(PiiConfigError),
}

impl ProcessingError {
Expand All @@ -139,7 +142,7 @@ impl ProcessingError {
Self::InvalidUnrealReport(_) => Some(Outcome::Invalid(DiscardReason::ProcessUnreal)),

// Internal errors
Self::SerializeFailed(_) | Self::ProcessingFailed(_) => {
Self::SerializeFailed(_) | Self::ProcessingFailed(_) | Self::PiiConfigError(_) => {
Some(Outcome::Invalid(DiscardReason::Internal))
}
#[cfg(feature = "processing")]
Expand Down Expand Up @@ -1738,7 +1741,11 @@ impl EnvelopeProcessor {
process_value(event, &mut processor, ProcessingState::root())
.map_err(ProcessingError::ProcessingFailed)?;
}
if let Some(config) = config.datascrubbing_settings.pii_config() {
let pii_config = config
.datascrubbing_settings
.pii_config()
.map_err(|e| ProcessingError::PiiConfigError(e.clone()))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will drop the event, which we should assign a dedicated discard reason to (or use the one we also use for broken project configs). Can we additionally mark the project config as invalid somehow so we reject subsequent events earlier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got me thinking, the entire concept of holding a cached Result inside the config, and then failing every time an event needs it, seems wrong.

pub(super) pii_config: OnceCell<Result<Option<PiiConfig>, PiiConfigError>>,

I simplified the code to convert the config on serialization, thereby making this particular error a deserialization error. Because it's a bigger refactor, I put that change into its own sub-PR: #1478

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closed the linked PR (see discussion there) and added a separate discard reason for pii conversion failure.

I would not put extra effort into invalidating the project config when this conversion fails, since having invalid data scrubbing settings is a clear bug that should be fixed on the sentry side, rather than an acceptable day-to-day case.

if let Some(config) = pii_config {
let mut processor = PiiProcessor::new(config.compiled());
process_value(event, &mut processor, ProcessingState::root())
.map_err(ProcessingError::ProcessingFailed)?;
Expand Down