Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search query TraceQL is truncated to 1024 characters #3513

Closed
zohnannor opened this issue Mar 22, 2024 · 10 comments · Fixed by #3571
Closed

Search query TraceQL is truncated to 1024 characters #3513

zohnannor opened this issue Mar 22, 2024 · 10 comments · Fixed by #3571
Labels
help wanted Extra attention is needed keepalive Label to exempt Issues / PRs from stale workflow traceql type/bug Something isn't working

Comments

@zohnannor
Copy link

For a query longer than 1024 characters, Tempo will return an error:

Error (invalid TraceQL query: parse error at line 1, col <N>: syntax error: unexpected $end). Please check the server logs for more details.

(where <N> is the length of query >1024)

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo
  2. Send a request with query longer than 1024 characters
  3. The error is returned

Expected behavior
Any query works.

Environment:

  • Infrastructure: docker

Additional Context

image

level=error ts=2024-03-22T10:55:05.623288905Z caller=search_handlers.go:86 msg="search: parse search request failed" err="invalid TraceQL query: parse error at line 1, col 1025: syntax error: unexpected $end"
level=info ts=2024-03-22T10:55:05.623317336Z caller=handler.go:142 tenant=single-tenant method=GET traceID=09c18d3793c5ae7c url="/api/search?q=<my_1025_characters_long_percent_encoded_query>&limit=20000&spss=3&start=1711100639&end=1711104239" duration=239.462µs response_size=0 status=400

Whole query is printed in the logs, so that means the parsing step truncates it to 1024 characters (or just stops after that number).

searchReq, err := api.ParseSearchRequest(req)

_, err := traceql.Parse(query)

e := l.parser.Parse(&l)

func (yyrcvr *yyParserImpl) Parse(yylex yyLexer) int {

As I can see, the goyacc package is used. I'm not an expert on this, but I think there should be a way to configure the parsing limits, so that they can be exposed in Tempo's config.

@joe-elliott joe-elliott added type/bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed keepalive Label to exempt Issues / PRs from stale workflow traceql and removed good first issue Good for newcomers labels Mar 24, 2024
@joe-elliott
Copy link
Member

It's pretty fantastic that you are trying to execute a 1024 char+ query. If anyone wants to tackle this it should be fairly straight forward to repro.

@zohnannor
Copy link
Author

zohnannor commented Mar 25, 2024

The query is built dynamically by using ${myvar:regex} variable syntax. myvar contains a lot of values, hence the query length.

@yzmp
Copy link

yzmp commented Apr 10, 2024

+1
Had the same issue running queries longer than 1024 characters.
Tried to filter by multiple span attributes, each one being a UUID so I easily reached the limit.

Error (invalid TraceQL query: parse error at line 1, col 1025: syntax error: unexpected $end). Please check the server logs for more details.

@yzmp
Copy link

yzmp commented Apr 10, 2024

Not sure if it is helpful, but it looks like the limit exists only for non-intrinsic fields.
If I send a big regex to rootServiceName or status, I get no errors:

{ rootServiceName =~ "7e9e2177-8a1d-40e7-a184-f91795910e4e|8f2c63fe-7d1f-4281-947f-ef7305ecc3e8|..." }

If I send the same to some span.attribute, I get a similar error as above (not the same, though, as I'm using =~ in this example and used = before):

{ span.attribute =~ "7e9e2177-8a1d-40e7-a184-f91795910e4e|8f2c63fe-7d1f-4281-947f-ef7305ecc3e8|..." }

@babsonmatt
Copy link

babsonmatt commented Apr 11, 2024

@joe-elliott

A co-worker and I spent some time last night investigating this further and determined that if you remove the call to tryScopeAttribute found in lexer.go it appears any sort of limitation around 1024 chars is lifted.

We injected some simple code to just Scan() through the scanner in Lex and the full query data > 1024 chars is present if we put the code above the calls to tryScopeAttribute but anything after that call appears to have the 1024 issue.

We tested via the lexer_test.go test and noticed when that tryScopeAttribute is called it appears to potentially consume from the scanner thats passed in and that perhaps the copy at line 274 may not actually be copying properly... commenting out the below from tryScopeAttribute:

for s.Peek() != scanner.EOF {
	if s.Peek() == '.' {
		str += string(s.Next())
		break
	}
	str += string(s.Next())
}

will alleviate the issue as well, leading us to believe the shallow copy is somehow interfering/not a real copy of the the passed in scanner, l.

Could we get any direction/thoughts from you on how this sounds and where to proceed from here? Thank you!

@joe-elliott
Copy link
Member

Thanks for digging in! That scanner copy trick is used in a number of places and was originally stolen from Loki (that doesn't mean it's not an issue).

It looks like the scanner has an internal buffer limited to 1024 characters which is likely the cause of our issue.

Can you post a test which exposes the problem? With a little debugging I could probably get a better understanding.

@babsonmatt
Copy link

babsonmatt commented Apr 11, 2024

Not a test, but hopefully this rough proof of concept demonstrates what we're thinking... forgive me if this isn't an accurate reproduction of what I think is happening in the tryScopeAttribute function, last night was the most amount of time I've spent in a Go codebase to date haha


import (
	"fmt"
	"strings"
	"text/scanner"
)

func tryScopeAttribute(l *scanner.Scanner) {
	s := *l
	str := ""

	for s.Peek() != scanner.EOF {
		str += string(s.Next())
	}
}

func main() {
	const src = "{hello world!}"

	var s scanner.Scanner
	s.Init(strings.NewReader(src))

	tryScopeAttribute(&s)

	for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
		fmt.Printf("%s: %s\n", s.Position, s.TokenText())
	}

}

If you run the above, and the s := *l was making an actual copy, you would expect the for loop to actually iterate over the entire provided string but instead, nothing happens because I believe it's running into scanner.EOF because the tryScopeAttribute has messed with the actual scanner and not the copy. You can try commenting the tryScopeAttribute call and note that the for loop iterates as expected.

Let me know what you think, thanks!

@joe-elliott
Copy link
Member

Welcome to go :)

So I can confirm by debugging that the copy works correctly and the original scanner is not impacted by the forward seek looking for a matching scope.

See linked fix. I've added code that causes the scope search to quit as soon as it comes across any non attribute rune and this passes all tests including a new one that exposed the issue. However, I still haven't found the exact mechanic that was causing the issue.

@joe-elliott
Copy link
Member

joe-elliott commented Apr 12, 2024

This also occurs when the attribute name is > 1024 chars. Fixed with aa5bf9f

@joe-elliott
Copy link
Member

Fix merged! Thx for all the help 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed keepalive Label to exempt Issues / PRs from stale workflow traceql type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants