Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short-circuiting AND for @subset #310

Closed
kescobo opened this issue Nov 19, 2021 · 4 comments
Closed

Short-circuiting AND for @subset #310

kescobo opened this issue Nov 19, 2021 · 4 comments

Comments

@kescobo
Copy link

kescobo commented Nov 19, 2021

Currently, each conditional seems to be evaluated for @subset, it would be nice (and possibly more efficient?) to use short-circuiting evaluation

julia> df = DataFrame(a = ["xy", "yz", missing, "za"], b=rand(4));

julia> @rsubset(df, !ismissing(:a), !startswith(:a, "x"))
ERROR: MethodError: no method matching startswith(::Missing, ::St
ring)

As @bkamins said on slack, one can currently get this behavior using && explicitly:

julia> @rsubset(df, !ismissing(:a) && !startswith(:a, "x"))
2×2 DataFrame
 Row │ a        b
     │ String?  Float64
─────┼───────────────────
   1 │ yz       0.701172
   2 │ za       0.757161
@pdeffebach
Copy link
Collaborator

The solution here is to use @passmissing. Hopefully in the near-future, @passmissing will also be allowed for column-wise operations. Closing this.

@kescobo
Copy link
Author

kescobo commented Mar 1, 2024

The point here was meant to be more broad than just missing operations though. There are lots of and conditionals that I use frequently, we could change this to

@rsubset(df, startswith(:a, "x"), endswith(:a, "y"))

@pdeffebach
Copy link
Collaborator

Ah. Thanks for the clarification.

Regardless, I don't think this is particularly actionable. @subset is a thin wrapper around DataFrames.subset and I don't think the benefits are large enough to merit what a large re-factor this would be.

@kescobo
Copy link
Author

kescobo commented Mar 1, 2024

Makes sense, no sweat! I opened an issue on DataFrames directly, though I'm guessing it won't make sense there either since you can get the short-circuiting behavior if you really need it, and it's only a modest speed-up at least in my testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants