From a2d3affce9afea481116bfee6ca83af257f2e57b Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Thu, 22 Jul 2021 21:52:22 -0500 Subject: [PATCH 01/14] Add developer FAQ to explain outliers and state_groups `outlier` explanation pulled from my own understanding and various comments throughout the code `state_group` explanation pulled from https://github.com/matrix-org/synapse/pull/10245#discussion_r673375767 by @erikjohnston --- docs/development/faq.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 docs/development/faq.md diff --git a/docs/development/faq.md b/docs/development/faq.md new file mode 100644 index 000000000000..13ee13628b17 --- /dev/null +++ b/docs/development/faq.md @@ -0,0 +1,15 @@ +# Developer FAQ + +## What is an `outlier`? + +An `outlier` is an arbitrary floating event in the DAG (as opposed to being +inline with the current DAG). It also means that we don't have the state events +backfilled on the homeserver and we trust the events *claimed* auth events rather +than those we calculate and verify to be correct. + + +## What is a `state_group`? + +For every non-outlier event we need to know the state at that event. Instead of storing the full state for each event in the DB (i.e. a `event_id -> state` mapping), which is *very* space inefficient when state doesn't change, we instead assign each different set of state a "state group" and then have mappings of `event_id -> state_group` and `state_group -> state`. + + From 785abe4058faf874c48866f8a810b47a24eaf452 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Thu, 22 Jul 2021 22:02:48 -0500 Subject: [PATCH 02/14] Add a note about ex_outliers --- docs/development/faq.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/development/faq.md b/docs/development/faq.md index 13ee13628b17..cbc564a9f4e7 100644 --- a/docs/development/faq.md +++ b/docs/development/faq.md @@ -5,7 +5,9 @@ An `outlier` is an arbitrary floating event in the DAG (as opposed to being inline with the current DAG). It also means that we don't have the state events backfilled on the homeserver and we trust the events *claimed* auth events rather -than those we calculate and verify to be correct. +than those we calculate and verify to be correct. + +An event can be unmarked as an `outlier` once we fetch all of its `prev_events` (you will see some `ex_outlier` code around this). ## What is a `state_group`? From 63fd9ccaecddd2dfd0ee7e9d694b0bcbca0c6ef8 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Thu, 22 Jul 2021 22:03:37 -0500 Subject: [PATCH 03/14] Add changelog --- changelog.d/10464.doc | 1 + 1 file changed, 1 insertion(+) create mode 100644 changelog.d/10464.doc diff --git a/changelog.d/10464.doc b/changelog.d/10464.doc new file mode 100644 index 000000000000..5a85250dcc49 --- /dev/null +++ b/changelog.d/10464.doc @@ -0,0 +1 @@ +Add developer FAQ to explain `outliers` and `state_groups`. From 3127f20fa1d40d11fe93b48c2e3cfdc3b7aa8602 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Fri, 23 Jul 2021 01:17:45 -0500 Subject: [PATCH 04/14] Add developer to FAQ table of contents --- docs/SUMMARY.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index db4ef1a44e86..09db78bae0d3 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -69,6 +69,7 @@ - [Code Style](code_style.md) - [Git Usage](dev/git.md) - [Testing]() + - [Developer FAQ](development/faq.md) - [OpenTracing](opentracing.md) - [Database Schemas](development/database_schema.md) - [Synapse Architecture]() From 2d68492d346c1481eed501b46d048ae0eac4fb24 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Wed, 28 Jul 2021 15:18:24 -0500 Subject: [PATCH 05/14] Move to room DAG concept page --- docs/SUMMARY.md | 2 +- docs/development/{faq.md => room-dag-concepts.md} | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) rename docs/development/{faq.md => room-dag-concepts.md} (97%) diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 09db78bae0d3..6f2c860f9e4b 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -69,7 +69,6 @@ - [Code Style](code_style.md) - [Git Usage](dev/git.md) - [Testing]() - - [Developer FAQ](development/faq.md) - [OpenTracing](opentracing.md) - [Database Schemas](development/database_schema.md) - [Synapse Architecture]() @@ -81,6 +80,7 @@ - [SAML](dev/saml.md) - [CAS](dev/cas.md) - [State Resolution]() + - [Room DAG concepts](development/room-dag-concepts.md) - [The Auth Chain Difference Algorithm](auth_chain_difference_algorithm.md) - [Media Repository](media_repository.md) - [Room and User Statistics](room_and_user_statistics.md) diff --git a/docs/development/faq.md b/docs/development/room-dag-concepts.md similarity index 97% rename from docs/development/faq.md rename to docs/development/room-dag-concepts.md index cbc564a9f4e7..cab658e16e4d 100644 --- a/docs/development/faq.md +++ b/docs/development/room-dag-concepts.md @@ -1,4 +1,4 @@ -# Developer FAQ +# Room DAG concepts ## What is an `outlier`? From 7392b63c9732bb1503af6aa0cef456476d60bb5d Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Wed, 28 Jul 2021 16:49:29 -0500 Subject: [PATCH 06/14] Updates from feedback and more concepts --- docs/development/room-dag-concepts.md | 69 +++++++++++++++++++++++---- 1 file changed, 61 insertions(+), 8 deletions(-) diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md index cab658e16e4d..2d28948fa754 100644 --- a/docs/development/room-dag-concepts.md +++ b/docs/development/room-dag-concepts.md @@ -1,17 +1,70 @@ # Room DAG concepts -## What is an `outlier`? +## Edges -An `outlier` is an arbitrary floating event in the DAG (as opposed to being -inline with the current DAG). It also means that we don't have the state events -backfilled on the homeserver and we trust the events *claimed* auth events rather -than those we calculate and verify to be correct. +The word "edge" comes from graph theory lingo. An edge is just a connection +between two events. In Synapse, we connect events by specifying their +`prev_events`. A subsequent event points back at a previous event. -An event can be unmarked as an `outlier` once we fetch all of its `prev_events` (you will see some `ex_outlier` code around this). +``` +A (oldest) <---- B <---- C (most recent) +``` -## What is a `state_group`? +## Depth and stream ordering -For every non-outlier event we need to know the state at that event. Instead of storing the full state for each event in the DB (i.e. a `event_id -> state` mapping), which is *very* space inefficient when state doesn't change, we instead assign each different set of state a "state group" and then have mappings of `event_id -> state_group` and `state_group -> state`. +Events are sorted by `(topological_ordering, stream_ordering)` where `topological_ordering` is just `depth`. Normally, `stream_ordering` is an auto incrementing integer but for `backfilled=true` events, it decrements. +`depth` is not re-calculated when messages are inserted into the DAG. + +## Forward extremity + +Most-recent-in-time events in the DAG which are not referenced by any `prev_events` yet. + +The forward extremities of a room are used as the `prev_events` when the next event is sent. + + +## Backwards extremity + +The current marker of where we have backfilled up to. + +A backwards extremity is a place where the oldest-in-time events of the DAG + +This is an event where we haven't fetched all of the `prev_events` for. + +Once we have fetched all of it's `prev_events`, it's unmarked as backwards extremity +and those `prev_events` become the new backwards extremities. + + +## Outliers + +We mark an event as an `outlier` when we haven't figured out the state for the +room at that point in the DAG yet. + +We won't *necessarily* have the `prev_events` of an `outlier` in the database, but it's entirely possible that we *might*. The status of whether we have all of the `prev_events` is marked as +a [backwards extremity](#backwards-extremity). + +For example, when we fetch the event auth chain or state for a given event, we mark all of those +claimed auth events as outliers because we haven't done the state calculation ourself. + + +### Floating outlier + +A floating `outlier` is an arbitrary floating event in the DAG (as opposed to being +inline with the current DAG). This happens when it the event doesn't have any `prev_events` +or fake `prev_events` that don't exist. + + +## State groups + +For every non-outlier event we need to know the state at that event. Instead of +storing the full state for each event in the DB (i.e. a `event_id -> state` +mapping), which is *very* space inefficient when state doesn't change, we +instead assign each different set of state a "state group" and then have +mappings of `event_id -> state_group` and `state_group -> state`. + + +### Stage group edges + +TODO: `state_group_edges` is a further optimization... From abe66d1e14f13d36dddefec2f69bd2e4cbf3d113 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Wed, 28 Jul 2021 17:19:16 -0500 Subject: [PATCH 07/14] Update changelog --- changelog.d/10464.doc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/changelog.d/10464.doc b/changelog.d/10464.doc index 5a85250dcc49..764fb9f65c23 100644 --- a/changelog.d/10464.doc +++ b/changelog.d/10464.doc @@ -1 +1 @@ -Add developer FAQ to explain `outliers` and `state_groups`. +Add some developer docs to explain room DAG concepts like `outliers`, `state_groups`, `depth`, etc. From 885a880dd476fc91271b53b0e196891a36bda164 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Wed, 28 Jul 2021 17:20:11 -0500 Subject: [PATCH 08/14] Wrap lines --- docs/development/room-dag-concepts.md | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md index 2d28948fa754..3c39c19fea77 100644 --- a/docs/development/room-dag-concepts.md +++ b/docs/development/room-dag-concepts.md @@ -13,7 +13,9 @@ A (oldest) <---- B <---- C (most recent) ## Depth and stream ordering -Events are sorted by `(topological_ordering, stream_ordering)` where `topological_ordering` is just `depth`. Normally, `stream_ordering` is an auto incrementing integer but for `backfilled=true` events, it decrements. +Events are sorted by `(topological_ordering, stream_ordering)` where +`topological_ordering` is just `depth`. Normally, `stream_ordering` is an auto +incrementing integer but for `backfilled=true` events, it decrements. `depth` is not re-calculated when messages are inserted into the DAG. @@ -33,8 +35,8 @@ A backwards extremity is a place where the oldest-in-time events of the DAG This is an event where we haven't fetched all of the `prev_events` for. -Once we have fetched all of it's `prev_events`, it's unmarked as backwards extremity -and those `prev_events` become the new backwards extremities. +Once we have fetched all of it's `prev_events`, it's unmarked as backwards +extremity and those `prev_events` become the new backwards extremities. ## Outliers @@ -42,18 +44,20 @@ and those `prev_events` become the new backwards extremities. We mark an event as an `outlier` when we haven't figured out the state for the room at that point in the DAG yet. -We won't *necessarily* have the `prev_events` of an `outlier` in the database, but it's entirely possible that we *might*. The status of whether we have all of the `prev_events` is marked as -a [backwards extremity](#backwards-extremity). +We won't *necessarily* have the `prev_events` of an `outlier` in the database, +but it's entirely possible that we *might*. The status of whether we have all of +the `prev_events` is marked as a [backwards extremity](#backwards-extremity). -For example, when we fetch the event auth chain or state for a given event, we mark all of those -claimed auth events as outliers because we haven't done the state calculation ourself. +For example, when we fetch the event auth chain or state for a given event, we +mark all of those claimed auth events as outliers because we haven't done the +state calculation ourself. ### Floating outlier -A floating `outlier` is an arbitrary floating event in the DAG (as opposed to being -inline with the current DAG). This happens when it the event doesn't have any `prev_events` -or fake `prev_events` that don't exist. +A floating `outlier` is an arbitrary floating event in the DAG (as opposed to +being inline with the current DAG). This happens when it the event doesn't have +any `prev_events` or fake `prev_events` that don't exist. ## State groups From e79502c234bf256fbbb6810863f43f9686701a39 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Wed, 28 Jul 2021 17:22:09 -0500 Subject: [PATCH 09/14] Fix floating outlier grammar --- docs/development/room-dag-concepts.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md index 3c39c19fea77..0386fe51b361 100644 --- a/docs/development/room-dag-concepts.md +++ b/docs/development/room-dag-concepts.md @@ -56,8 +56,8 @@ state calculation ourself. ### Floating outlier A floating `outlier` is an arbitrary floating event in the DAG (as opposed to -being inline with the current DAG). This happens when it the event doesn't have -any `prev_events` or fake `prev_events` that don't exist. +being inline with the current DAG). This happens when an event doesn't have +any `prev_events` or has `prev_events` that don't exist. ## State groups From 85e66d63d0b8d11e978fe81d6becb6fef0749f39 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Wed, 28 Jul 2021 17:33:10 -0500 Subject: [PATCH 10/14] Add some more context around depth/stream_ordering with API endpoint usage examples --- docs/development/room-dag-concepts.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md index 0386fe51b361..668fee8879ca 100644 --- a/docs/development/room-dag-concepts.md +++ b/docs/development/room-dag-concepts.md @@ -14,10 +14,19 @@ A (oldest) <---- B <---- C (most recent) ## Depth and stream ordering Events are sorted by `(topological_ordering, stream_ordering)` where -`topological_ordering` is just `depth`. Normally, `stream_ordering` is an auto +`topological_ordering` is just `depth`. In other words, we first sort by `depth` +and then tie-break based on `stream_ordering`. `depth` is incremented as new +messages are added to the DAG. Normally, `stream_ordering` is an auto incrementing integer but for `backfilled=true` events, it decrements. -`depth` is not re-calculated when messages are inserted into the DAG. +`depth` is not re-calculated if a message is inserted into the middle of the DAG. + +--- + + - `/sync` returns things in the order they arrive at the server (`stream_ordering`). + - `/backfill` returns them in the order determined by the event graph `(topological_ordering, stream_ordering)`. + +The general idea is that, if you're following a room in real-time (i.e. `/sync`), you probably want to see the messages as they arrive at your server, rather than skipping any that arrived late; whereas if you're looking at a historical section of timeline (i.e. `/messages`), you want to see the best representation of the state of the room as others were seeing it at the time. ## Forward extremity From 997a4bede9d7d8cce141168b029d36cfa3c55d39 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Thu, 29 Jul 2021 21:51:35 -0500 Subject: [PATCH 11/14] Grammar and better clarify a few statements Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> --- docs/development/room-dag-concepts.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md index 668fee8879ca..10d291e2a8dd 100644 --- a/docs/development/room-dag-concepts.md +++ b/docs/development/room-dag-concepts.md @@ -13,25 +13,25 @@ A (oldest) <---- B <---- C (most recent) ## Depth and stream ordering -Events are sorted by `(topological_ordering, stream_ordering)` where +Events are normally sorted by `(topological_ordering, stream_ordering)` where `topological_ordering` is just `depth`. In other words, we first sort by `depth` and then tie-break based on `stream_ordering`. `depth` is incremented as new messages are added to the DAG. Normally, `stream_ordering` is an auto -incrementing integer but for `backfilled=true` events, it decrements. +incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement. `depth` is not re-calculated if a message is inserted into the middle of the DAG. --- - `/sync` returns things in the order they arrive at the server (`stream_ordering`). - - `/backfill` returns them in the order determined by the event graph `(topological_ordering, stream_ordering)`. + - `/messages` (and `/backfill in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`. The general idea is that, if you're following a room in real-time (i.e. `/sync`), you probably want to see the messages as they arrive at your server, rather than skipping any that arrived late; whereas if you're looking at a historical section of timeline (i.e. `/messages`), you want to see the best representation of the state of the room as others were seeing it at the time. ## Forward extremity -Most-recent-in-time events in the DAG which are not referenced by any `prev_events` yet. +Most-recent-in-time events in the DAG which are not referenced by any other events' `prev_events` yet. The forward extremities of a room are used as the `prev_events` when the next event is sent. @@ -44,7 +44,7 @@ A backwards extremity is a place where the oldest-in-time events of the DAG This is an event where we haven't fetched all of the `prev_events` for. -Once we have fetched all of it's `prev_events`, it's unmarked as backwards +Once we have fetched all of its `prev_events`, it's unmarked as a backwards extremity and those `prev_events` become the new backwards extremities. From 64ebf4215216140c35d3a7367a2b597a99cad355 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Thu, 29 Jul 2021 22:45:28 -0500 Subject: [PATCH 12/14] Wrap and address some review clarifications --- docs/development/room-dag-concepts.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md index 10d291e2a8dd..918d1596d466 100644 --- a/docs/development/room-dag-concepts.md +++ b/docs/development/room-dag-concepts.md @@ -19,14 +19,16 @@ and then tie-break based on `stream_ordering`. `depth` is incremented as new messages are added to the DAG. Normally, `stream_ordering` is an auto incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement. -`depth` is not re-calculated if a message is inserted into the middle of the DAG. - --- - `/sync` returns things in the order they arrive at the server (`stream_ordering`). - `/messages` (and `/backfill in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`. -The general idea is that, if you're following a room in real-time (i.e. `/sync`), you probably want to see the messages as they arrive at your server, rather than skipping any that arrived late; whereas if you're looking at a historical section of timeline (i.e. `/messages`), you want to see the best representation of the state of the room as others were seeing it at the time. +The general idea is that, if you're following a room in real-time (i.e. +`/sync`), you probably want to see the messages as they arrive at your server, +rather than skipping any that arrived late; whereas if you're looking at a +historical section of timeline (i.e. `/messages`), you want to see the best +representation of the state of the room as others were seeing it at the time. ## Forward extremity @@ -38,14 +40,16 @@ The forward extremities of a room are used as the `prev_events` when the next ev ## Backwards extremity -The current marker of where we have backfilled up to. - -A backwards extremity is a place where the oldest-in-time events of the DAG +The current marker of where we have backfilled up to and will generally be the +oldest-in-time events we know of in the DAG. This is an event where we haven't fetched all of the `prev_events` for. Once we have fetched all of its `prev_events`, it's unmarked as a backwards -extremity and those `prev_events` become the new backwards extremities. +extremity and those `prev_events` become the new backwards extremities (unless +we have already persisted them). Also in reality, we backfill in batches of 20 +events or so, so only the `prev_events` of the last oldest-in-time event will +become the backwards extremeties. ## Outliers From 9e1e92c39112b3c9043ba9ed77dd9a8ca104630c Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Thu, 29 Jul 2021 22:57:48 -0500 Subject: [PATCH 13/14] No distinction for floating outliers --- docs/development/room-dag-concepts.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md index 918d1596d466..76a8cc73c63a 100644 --- a/docs/development/room-dag-concepts.md +++ b/docs/development/room-dag-concepts.md @@ -65,12 +65,11 @@ For example, when we fetch the event auth chain or state for a given event, we mark all of those claimed auth events as outliers because we haven't done the state calculation ourself. +Outliers are sometimes referred to as floating outliers but there is no +distinction between a normal and floating outlier. The floating descriptor just +comes from the fact that all outliers are an arbitrary floating event in the DAG +as opposed to being inline with the current DAG. -### Floating outlier - -A floating `outlier` is an arbitrary floating event in the DAG (as opposed to -being inline with the current DAG). This happens when an event doesn't have -any `prev_events` or has `prev_events` that don't exist. ## State groups @@ -85,3 +84,4 @@ mappings of `event_id -> state_group` and `state_group -> state`. ### Stage group edges TODO: `state_group_edges` is a further optimization... + notes from @Azrenbeth, https://pastebin.com/seUGVGeT From e9f58dfdec681dccaede26987bb062d59881659e Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Tue, 3 Aug 2021 00:57:40 -0500 Subject: [PATCH 14/14] Simplify some extremity language and review touches --- docs/SUMMARY.md | 2 +- docs/development/room-dag-concepts.md | 14 +++----------- 2 files changed, 4 insertions(+), 12 deletions(-) diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 104138d898cb..10be12d63865 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -79,8 +79,8 @@ - [Single Sign-On]() - [SAML](development/saml.md) - [CAS](development/cas.md) + - [Room DAG concepts](development/room-dag-concepts.md) - [State Resolution]() - - [Room DAG concepts](development/room-dag-concepts.md) - [The Auth Chain Difference Algorithm](auth_chain_difference_algorithm.md) - [Media Repository](media_repository.md) - [Room and User Statistics](room_and_user_statistics.md) diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md index 76a8cc73c63a..5eed72bec662 100644 --- a/docs/development/room-dag-concepts.md +++ b/docs/development/room-dag-concepts.md @@ -22,7 +22,7 @@ incrementing integer, but backfilled events start with `stream_ordering=-1` and --- - `/sync` returns things in the order they arrive at the server (`stream_ordering`). - - `/messages` (and `/backfill in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`. + - `/messages` (and `/backfill` in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`. The general idea is that, if you're following a room in real-time (i.e. `/sync`), you probably want to see the messages as they arrive at your server, @@ -46,10 +46,8 @@ oldest-in-time events we know of in the DAG. This is an event where we haven't fetched all of the `prev_events` for. Once we have fetched all of its `prev_events`, it's unmarked as a backwards -extremity and those `prev_events` become the new backwards extremities (unless -we have already persisted them). Also in reality, we backfill in batches of 20 -events or so, so only the `prev_events` of the last oldest-in-time event will -become the backwards extremeties. +extremity (although we may have formed new backwards extremities from the prev +events during the backfilling process). ## Outliers @@ -65,12 +63,6 @@ For example, when we fetch the event auth chain or state for a given event, we mark all of those claimed auth events as outliers because we haven't done the state calculation ourself. -Outliers are sometimes referred to as floating outliers but there is no -distinction between a normal and floating outlier. The floating descriptor just -comes from the fact that all outliers are an arbitrary floating event in the DAG -as opposed to being inline with the current DAG. - - ## State groups