Feat(outcomes): Aggregate client reports [INGEST-247] #1118

jjbayer · 2021-11-08T19:58:35Z

Aggregate outcomes generated by client reports before sending them on.

The key for aggregation buckets consists of

the outcome timestamp rounded to the bucket interval (1 minute default),
all other outcome fields (event_id is None for client reports).

The value of an aggregated bucket is the outcome quantity.

This PR prepares for INGEST-247. It does not yet implement collecting outcomes in external relays.

jjbayer · 2021-11-09T07:47:07Z

relay-server/src/actors/outcome_aggregator.rs

+                    };
+
+                    relay_log::trace!("Flushing outcome for timestamp {}", timestamp);
+                    outcome_producer.do_send(outcome).ok(); // TODO: should we handle send errors here?


Here we send messages in a loop. Could this overflow the outcome producer's mailbox? Should we send a batch of outcomes to the producer instead?

Jan told me that this bypasses the inbox

Right, thank you. The docs on Recipient are actually very clear on this:
https://docs.rs/actix/0.7.9/actix/struct.Recipient.html#method.do_send

untitaker

i think this does some unnecessary allocations on hashmap but otherwise it should be fine.

untitaker · 2021-11-09T13:12:35Z

relay-server/src/actors/outcome_aggregator.rs

+    /// Mapping from offset to bucket key to quantity. timestamp = offset * bucket_interval
+    buckets: BTreeMap<u64, HashMap<BucketKey, u32>>,


why not put the timestamp into the bucket key and avoid this nesting and extra allocation? Together with split_off optimization this should make the flush quite fast as long as the timestamp is declared on top of BucketKey (see documentation for derive(PartialOrd))

I agree with untitaker, I would use a single bucket_key -> quantity map (with the timesamp % bucket_interval embedded in the bucket_key)

Did as you suggested. Had to tweak the BucketKey type a bit in order to call split_off with a key containing only an offset.

untitaker · 2021-11-09T13:14:06Z

relay-server/src/actors/outcome_aggregator.rs

+        let max_offset = (UnixTimestamp::now().as_secs() - self.flush_delay) / self.bucket_interval;
+        let bucket_interval = self.bucket_interval;
+        let outcome_producer = self.outcome_producer.clone();
+        self.buckets.retain(|offset, mapping| {


you could probably use split_off here

RaduW · 2021-11-09T14:43:32Z

Looks good.
I think we are overcomplicating a bit the logic of sending buckets but otherwise looks good.

RaduW · 2021-11-09T13:50:02Z

relay-config/src/config.rs

@@ -946,6 +950,8 @@ impl Default for Outcomes {
            batch_size: 1000,
            batch_interval: 500,
            source: None,
+            bucket_interval: 60,
+            flush_delay: 30,


Why do we flush every 30 seconds if the buckets are of 60 seconds ?

flush_delay is the grace period during which outcomes may still be submitted to a bucket after its time window has passed.

RaduW · 2021-11-09T13:59:19Z

relay-server/src/actors/outcome_aggregator.rs

+        let outcome_producer = self.outcome_producer.clone();
+        self.buckets.retain(|offset, mapping| {
+            if offset <= &max_offset {
+                for (bucket_key, quantity) in mapping.drain() {


I would not bother with this, I would just flush everything every flush_delay.
I understand that you don't want to send twice buckets that are not full yet but if we make the
flush_delay >= bucket_internal (which I think we should) that would not be a big problem, we
may send buckets more than once anyway (for outcmoes that come late).

I must admit I did not even think of this option, and I like the simplicity. Should have read this comment before implementing the split_off :)

@untitaker just to get an additional opinion, what do you think about the suggestion of simply flushing all the buckets on every flush?

makes sense!

OK, simplified to use a single hashmap that gets flushed unconditionally every two minutes.

RaduW · 2021-11-09T14:43:28Z

relay-server/src/actors/outcome_aggregator.rs

+    /// Mapping from offset to bucket key to quantity. timestamp = offset * bucket_interval
+    buckets: BTreeMap<u64, HashMap<BucketKey, u32>>,


I agree with untitaker, I would use a single bucket_key -> quantity map (with the timesamp % bucket_interval embedded in the bucket_key)

untitaker · 2021-11-10T12:40:23Z

relay-server/src/actors/outcome_aggregator.rs

+    /// The number of seconds between flushes of all buckets
+    flush_interval: u64,
+    /// Mapping from bucket key to quantity.
+    buckets: HashMap<BucketKey, u32>,


why do we use a hashmap here? (and also in the metrics aggregator I guess)

recap from slack: probably no strong reason one way or the other right now

* master: fix(logging): Demote parsing error to debug level (#1120) feat(outcomes): Aggregate client reports (#1118)

Add the option to emit outcomes as client reports. This is now the default behavior of non-processing relays. - Dynamic sampling config is now propagated to untrusted relays, allowing them to apply sampling rules. - The emit_outcomes config flag can have three states: true, false, or "client reports". - Every created TrackOutcome is sent to the outcome aggregator instead of the outcome producer. - If configured to emit outcomes as client reports, the outcome aggregator erases event_id and remote_addr from the outcome. - If an outcome still has an event_id after step 3, it is forwarded to the configured producer without aggregating. - Else, the outcome is aggregated as in Feat(outcomes): Aggregate client reports [INGEST-247] #1118. - Finally, the outcome producer converts the aggregated outcome to a client report and sends it to the upstream as an envelope.

jjbayer added 6 commits November 8, 2021 20:33

feat(outcomes): Aggregate client reports in outcome aggregator

a519519

test: Test aggregation in integration test

e15501f

ref: Cleanup + doc

9496481

Add statsd metric for outcome aggregator flush time

c2dd6ea

ref: Remove old comment

91a631c

fix: Increase mailbox size

64749c0

jjbayer commented Nov 9, 2021

View reviewed changes

Add changelog

e0214ed

jjbayer marked this pull request as ready for review November 9, 2021 08:05

jjbayer requested review from a team, jan-auer, mitsuhiko, RaduW and untitaker November 9, 2021 08:05

jjbayer added 2 commits November 9, 2021 13:08

ref: Use or_default instead of or_insert_with

29581bc

feat: Omit remote address for client report outcomes

b34ea55

untitaker approved these changes Nov 9, 2021

View reviewed changes

RaduW approved these changes Nov 9, 2021

View reviewed changes

jjbayer added 2 commits November 9, 2021 16:57

ref: Use single BTreeMap, split_off to find relevant buckets

ea860ec

ref: Simplify bucket flushing, config

d51143e

jjbayer requested review from untitaker and RaduW November 10, 2021 12:36

untitaker approved these changes Nov 10, 2021

View reviewed changes

untitaker changed the title ~~Feat(outcomes): Aggregate client reports~~ Feat(outcomes): Aggregate client reports [INGEST-247] Nov 10, 2021

jjbayer merged commit c9b2996 into master Nov 10, 2021

jjbayer deleted the feat/aggregate-client-reports branch November 10, 2021 14:47

jan-auer added a commit that referenced this pull request Nov 11, 2021

Merge branch 'master' into wip/shared-payload-3

a8dbe36

* master: fix(logging): Demote parsing error to debug level (#1120) feat(outcomes): Aggregate client reports (#1118)

jjbayer mentioned this pull request Nov 15, 2021

feat(outcomes): Emit outcomes as client reports [INGEST-247] #1119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat(outcomes): Aggregate client reports [INGEST-247] #1118

Feat(outcomes): Aggregate client reports [INGEST-247] #1118

jjbayer commented Nov 8, 2021 •

edited

Loading

jjbayer Nov 9, 2021

mitsuhiko Nov 9, 2021

jjbayer Nov 9, 2021

untitaker left a comment

untitaker Nov 9, 2021

RaduW Nov 9, 2021

jjbayer Nov 9, 2021

untitaker Nov 9, 2021

RaduW commented Nov 9, 2021

RaduW Nov 9, 2021

jjbayer Nov 9, 2021

RaduW Nov 9, 2021

jjbayer Nov 9, 2021

jjbayer Nov 9, 2021

untitaker Nov 10, 2021

jjbayer Nov 10, 2021

RaduW Nov 9, 2021

untitaker Nov 10, 2021

untitaker Nov 10, 2021 •

edited

Loading

		/// Mapping from offset to bucket key to quantity. timestamp = offset * bucket_interval
		buckets: BTreeMap<u64, HashMap<BucketKey, u32>>,

Feat(outcomes): Aggregate client reports [INGEST-247] #1118

Feat(outcomes): Aggregate client reports [INGEST-247] #1118

Conversation

jjbayer commented Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RaduW commented Nov 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Nov 10, 2021 • edited Loading

Choose a reason for hiding this comment

jjbayer commented Nov 8, 2021 •

edited

Loading

untitaker Nov 10, 2021 •

edited

Loading