-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vParquet2] The next version of Tempo's parquet based block format #2244
Conversation
45853ad
to
a4ed8f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprisingly complete for a first pass. Nice work. I have some comments on cleaning up the encodings, but other than that this looks good.
As discussed we will get the parent,nestedsetleft/right columns in this PR, but follow up with actually populating them later.
Do you have a benchmark on hand that shows the benefit from the new span duration column? It would be neat to see the improvement for a query like |
No, I didn't benchmark it. Should I add one? |
No that's ok, I think this is tough to benchmark on synthetic data but if you had some numbers offhand I was curious. I didn't see a benefit locally between vparquet and vparquet2 which is how I came across the predicate recommendation. |
Make column names using timestamps more consistent and more similar the names used by OTEL
Rename columns to match the latest OTEL standard
Increases performance of queries for span duration, as it is no longer required to search two columns to get the span duration
The columns are a prerequisite to improve structural TraceQL queries The new columns are not populated yet
This makes the schema more similar to the OTEL standard and makes it possible to add a numeric span ID later (see ParentID vs ParentSpanID)
This improves the compatibility with other tooling using the parquet format such as parquet-mr/parquet-cli
This prevents potential bugs with nil predicates where '== nil' is false because the value pointer is nil but the type pointer is not
Since there is now a dedicated duration column, queries for duration can be treated more like any other integer column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Last changes for duration look good, seeing a benchmark improvement for those queries as expected.
What this PR does:
Creates a new tempodb encoding named
vParquet2
. The new encoding implements parquet schema changes that are not compatible withvParquet
:Review hints
vParquet
which makes it difficult to spot the differences betweenvParquet
andvParquet2
. I created a second PR [vParquet2] all commits for vParquet2 as changes of the existing vParquet #2243 that implements all changes forvParquet2
in thevParquet
treevParquet2
schema changesWhich issue(s) this PR fixes:
Contributes to grafana/tempo-squad#179 and #2226
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]