Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Replace INSIDE_QS with QS #4

Open
dacort opened this issue Jan 2, 2019 · 0 comments
Open

Replace INSIDE_QS with QS #4

dacort opened this issue Jan 2, 2019 · 0 comments
Labels
enhancement New feature or request

Comments

@dacort
Copy link
Contributor

dacort commented Jan 2, 2019

There are several places where INSIDE_QS is used instead of QS particularly in ALB regex. As noted in #3 this can cause issues in certain cases.

However, we have to be careful in replacing this - there are some fields where if they are empty, they will be a simple hyphen (-) or a quoted hyphen ("-") depending on the log entry. This pops up specifically with S3 access logs, where some entries can be quoted strings, a quoted hyphen, or a non-quoted hyphen. If the grok expression doesn't match, data won't be returned so could go missing.

Further, these situations can be difficult to detect. In a simple test, I replaced INSIDE_QS with QS and did a COUNT(*) of the two tables with Athena. The same number was returned. However, when I did a COUNT(*) with a WHERE statement filtering by request_id prefix...different results were returned. I'm guessing this is due to a COUNT(*) not deserializing everything(?).

@dacort dacort changed the title Replace INSIDE_QS with QS Replace INSIDE_QS with QS Jan 2, 2019
@dacort dacort added the enhancement New feature or request label Jan 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant