-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix bug in query result marshaling for invalid utf8 characters #14585
Conversation
…d utf8 characters are present Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving with small improvements to apply
pkg/util/marshal/query.go
Outdated
} | ||
return r | ||
} | ||
stream.Labels = string(bytes.Map(removeInvalidUtf, []byte(stream.Labels))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to use bytes.Map()
only if []byte(stream.Labels)
contains utf8.RuneError
.
more details here
pkg/util/marshal/query.go
Outdated
@@ -77,6 +79,14 @@ func NewStreams(s logqlmodel.Streams) (loghttp.Streams, error) { | |||
ret := make([]loghttp.Stream, len(s)) | |||
|
|||
for i, stream := range s { | |||
// The rune error replacement is rejected by Prometheus hence replacing them with space. | |||
removeInvalidUtf := func(r rune) rune { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be moved to global vars to initialize once
Labels: "{asdf=\"�\"}", | ||
}, | ||
}) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be useful to assert the result as well
Signed-off-by: Callum Styan <[email protected]>
Potentially we want to fix this by actually sanitizing data for OTLP ingestion/structured metadata, but I think we should include a fix to allow users with data that is already stored in Loki to query it out.
The bug is present for streams which have structured metadata which contains invalid UTF-8 characters. When you write a query like
{label="x"}
there's no problem, but as soon as you dolabel="x"} |= "some filter"
the bug kicks in because we pull in the structured metadata.When we then go to marshal the labelset on the query result we run into an error here because the underlying call to
NewLabelSet
calls Prometheus'ParseMetric
which enforces "any valid Unicode character" in the label value. This fails if there's an invalid character. I'm not entirely sure what the invalid character is or where it's coming from, but we have precedent for checking for it already asRuneError
(here) in other places. For this PR I've simply copied what we do for the JSONParser from here.