You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.
There are several places where INSIDE_QS is used instead of QS particularly in ALB regex. As noted in #3 this can cause issues in certain cases.
However, we have to be careful in replacing this - there are some fields where if they are empty, they will be a simple hyphen (-) or a quoted hyphen ("-") depending on the log entry. This pops up specifically with S3 access logs, where some entries can be quoted strings, a quoted hyphen, or a non-quoted hyphen. If the grok expression doesn't match, data won't be returned so could go missing.
Further, these situations can be difficult to detect. In a simple test, I replaced INSIDE_QS with QS and did a COUNT(*) of the two tables with Athena. The same number was returned. However, when I did a COUNT(*) with a WHERE statement filtering by request_id prefix...different results were returned. I'm guessing this is due to a COUNT(*) not deserializing everything(?).
The text was updated successfully, but these errors were encountered:
dacort
changed the title
Replace INSIDE_QS with QS
Replace INSIDE_QS with QS
Jan 2, 2019
There are several places where
INSIDE_QS
is used instead ofQS
particularly in ALB regex. As noted in #3 this can cause issues in certain cases.However, we have to be careful in replacing this - there are some fields where if they are empty, they will be a simple hyphen (
-
) or a quoted hyphen ("-"
) depending on the log entry. This pops up specifically with S3 access logs, where some entries can be quoted strings, a quoted hyphen, or a non-quoted hyphen. If the grok expression doesn't match, data won't be returned so could go missing.Further, these situations can be difficult to detect. In a simple test, I replaced
INSIDE_QS
withQS
and did aCOUNT(*)
of the two tables with Athena. The same number was returned. However, when I did aCOUNT(*)
with aWHERE
statement filtering by request_id prefix...different results were returned. I'm guessing this is due to aCOUNT(*)
not deserializing everything(?).The text was updated successfully, but these errors were encountered: