-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add argument to get_column_values allow for alternate sorting of column values #289
Conversation
@clausherther thanks for implementing! was looking for this exact feature just now! |
macros/sql/get_column_values.sql
Outdated
from {{ target_relation }} | ||
group by 1 | ||
order by 2 {{ sort_direction if sort_direction else "desc" }} | ||
{% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi - I created a local copy of this and hit an error on this endif
in DBT v18:
Encountered an error:
Compilation Error in macro (macros/sorted_get_column_values.sql)
Got an unexpected control flow end tag, got endif but never saw a preceeding if (@ 63:8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for highlighting that! Just pushed a fix to remove that extra endif
@clrcrl anything else we need to do here to get this merged? |
macros/sql/get_column_values.sql
Outdated
{% else %} | ||
{# We take the max sort value for each value to make sure | ||
there are no duplicate rows for each value #} | ||
max({{ sort_column }}) as sort_column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, this is interesting. I thought this PR would primarily for the use-case of it alpha-sorting, i.e. "order by {{ column }}" (which I typically achieve through a Jinja sort)
What kind of ways do you typically want to sort these results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended to let you sort by frequency in ascending (or descending) order, which I think was was not possible previously and is not possible in Jinja with just the results since the counts aren't returned.
(This is based on an internal version of this we had to implement at Quibi as a result.)
macros/sql/get_column_values.sql
Outdated
{% endif %} | ||
from {{ target_relation }} | ||
group by 1 | ||
order by 2 {{ sort_direction if sort_direction else "desc" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with the above comment, it may be easier to do something like this:
order by 2 {{ sort_direction if sort_direction else "desc" }} | |
order by {{ order_by }} {{ sort_direction or "desc" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still trying to figure out how to make this work with non-aggregate columns. E.g. you pass in column_2
instead count(*)
we need to wrap in an aggregation, like max(column_2)
, or group by it as well.
@clrcrl lmk what you think of the way I implemented the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with this locally:
Here's my testing file:
{% set list_default_order = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method')
%}
{% do log('list_default_order: ' ~ list_default_order, info=True) %}
{% set list_alpha_order = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='payment_method')
%}
{% do log('list_alpha_order: ' ~ list_alpha_order, info=True) %}
{% set list_recency_order = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='max(created_at)',
sort_direction='desc')
%}
{% do log('list_recency_order: ' ~ list_recency_order, info=True) %}
It all worked as expected!
list_default_order: ['credit_card', 'bank_transfer', 'coupon', 'gift_card']
list_alpha_order: ['bank_transfer', 'coupon', 'credit_card', 'gift_card']
list_recency_order: ['credit_card', 'bank_transfer', 'gift_card', 'coupon']
I did have a slightly different implementation in mind, which I think makes the sort order a little less confusing for the two cases — lmk what you think:
{%- set order_by = order_by or 'count(*) desc' -%}
...
with sorted_column_values as (
select
{{ column }} as value -- note: no second column declared
from {{ target_relation }}
group by 1
order by {{ order_by }}
{% if max_records is not none %}
limit {{ max_records }}
{% endif %}
)
select value from sorted_column_values
By removing the second column in the query, we don't have to get tricky with the order_by
column. I also just ended up
removing the sort_direction
input as I figured it could be pretty easily packaged up in the order_by
definition.
The above implementations would end up looking like so (i.e. no "sort_direction" parameter)
{% set list_default_order = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method')
%}
{% do log('list_default_order: ' ~ list_default_order, info=True) %}
{% set list_alpha_order = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='payment_method')
%}
{% do log('list_alpha_order: ' ~ list_alpha_order, info=True) %}
{% set list_recency_order = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='max(created_at) desc'
%}
{% do log('list_recency_order: ' ~ list_recency_order, info=True) %}
This does mean there's fewer built-in defaults, but may be a little easier to document and reason about?
Closing in favor of #349 — I let this get out of date and it was easier for me to open a new PR to get the code in a mergeable state. |
This is a:
main
)Description & motivation
This PR adds a
sort_column
andsort_direction
argument toget_column_values
to allow for alternate sorting of column values. This based on work from @joshpeng-quibiCloses #288
Checklist