-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not_blank test #632
Comments
@dbeatty10 and / or @joellabes does this seem like a valuable addition to utils? I need it for a client project I'm working on, but I've had several other projects where I would have benefited from it in the past as well. |
Yes I'm into it! Would this also trim whitespace before evaluating? I.e. a totally empty string would fail, but what about a string that contains a single space? I'm asking based on this similar method, IsNullOrWhitespace() from my C# days, which I preferred over its sibling IsNullOrEmpty(). I don't have strooong feelings about whether it's necessary or not. Perhaps it needs an extra argument that says whether to trim or not? {% test not_blank(model, column, ignore_whitespace) %}
select * from {{ model }} where
{% if ignore_whitespace %}
len(column) = 0 --remember to use {{ dbt.length() }} instead of a hardcoded one, natch https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros#length
{% else %}
len(ltrim(rtrim(column))) = 0
{% endif %}
{% endtest %} |
Some food for thoughts on this: In Java:
Source: https://www.baeldung.com/java-blank-empty-strings#empty-blank Also a popular validation framework in PHP says this:
Source: https://symfony.com/doc/current/reference/constraints/NotBlank.html Which raise the question on how the rule should behave when encountering a blank array for database that support them. My suggestion would be to be explicit in the rules name and avoid grouping different kind of tests into a single one using parameter flags. That would cause confusion. Meaning introducing in a first iteration:
Following the "Java" meaning but not testing And then think about how to support
|
@tigitz thanks for the feedback and the links to prior art!
I would be surprised if we could validate both strings and arrays in the same test without knowing their type in advance (which I think is your point in recommending
Can you say more about the confusion element? I'm pretty sure I disagree with you on this one, at least on tests of the same column type (e.g. empty string vs blank string). I consider it to be analogous to the (Accidental proving of my point: I wrote a lot of this response with blank and empty the wrong way around, and only discovered my mistake when I scrolled back up to your original comment before posting). What shall we name the
|
Yes exactly
Yes, if I'm looking to avoid blank strings in my data, I'll look for the related rule. Having to go through the whole documentation to understand that a non corresponding Having explicit and straightforward Also, mixing both tests into a single Say there are 3 scenarios you would like to test, "not_empty", "not_blank", "not_empty and not blank":
|
Thanks for all the incredible thought poured into this @joellabes and @tigitz. A quick first pass at the Is there a preference on using I can add an array counterpart + tests to round out the PR in the next day or two :) |
Describe the feature
A clear and concise description of what you want to happen.
dbt has the
not_null
test to monitor for null values, but in certain datasets blank values are also problematic. However, there is nonot_blank
test that I found when searching utils or dbt-expectations.Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Creating a custom test that errors if there are blanks.
Additional context
Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.
I think blank entries are a possibility on all databases
Who will this benefit?
What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
Anyone needing to check data integrity to ensure there are no blank values in their dataset
Are you interested in contributing this feature?
Yes plz 🙏🏻
The text was updated successfully, but these errors were encountered: