Short-circuiting AND for @subset #310

kescobo · 2021-11-19T19:45:51Z

Currently, each conditional seems to be evaluated for @subset, it would be nice (and possibly more efficient?) to use short-circuiting evaluation

julia> df = DataFrame(a = ["xy", "yz", missing, "za"], b=rand(4));

julia> @rsubset(df, !ismissing(:a), !startswith(:a, "x"))
ERROR: MethodError: no method matching startswith(::Missing, ::St
ring)

As @bkamins said on slack, one can currently get this behavior using && explicitly:

julia> @rsubset(df, !ismissing(:a) && !startswith(:a, "x"))
2×2 DataFrame
 Row │ a        b
     │ String?  Float64
─────┼───────────────────
   1 │ yz       0.701172
   2 │ za       0.757161

The text was updated successfully, but these errors were encountered:

pdeffebach · 2024-03-01T14:43:18Z

The solution here is to use @passmissing. Hopefully in the near-future, @passmissing will also be allowed for column-wise operations. Closing this.

kescobo · 2024-03-01T16:09:34Z

The point here was meant to be more broad than just missing operations though. There are lots of and conditionals that I use frequently, we could change this to

@rsubset(df, startswith(:a, "x"), endswith(:a, "y"))

pdeffebach · 2024-03-01T16:23:19Z

Ah. Thanks for the clarification.

Regardless, I don't think this is particularly actionable. @subset is a thin wrapper around DataFrames.subset and I don't think the benefits are large enough to merit what a large re-factor this would be.

kescobo · 2024-03-01T19:49:08Z

Makes sense, no sweat! I opened an issue on DataFrames directly, though I'm guessing it won't make sense there either since you can get the short-circuiting behavior if you really need it, and it's only a modest speed-up at least in my testing

pdeffebach closed this as completed Mar 1, 2024

kescobo mentioned this issue Mar 1, 2024

Short circuit && on subset? JuliaData/DataFrames.jl#3427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Short-circuiting AND for @subset #310

Short-circuiting AND for @subset #310

kescobo commented Nov 19, 2021 •

edited

Loading

pdeffebach commented Mar 1, 2024

kescobo commented Mar 1, 2024 •

edited

Loading

pdeffebach commented Mar 1, 2024

kescobo commented Mar 1, 2024

Short-circuiting AND for @subset #310

Short-circuiting AND for @subset #310

Comments

kescobo commented Nov 19, 2021 • edited Loading

pdeffebach commented Mar 1, 2024

kescobo commented Mar 1, 2024 • edited Loading

pdeffebach commented Mar 1, 2024

kescobo commented Mar 1, 2024

kescobo commented Nov 19, 2021 •

edited

Loading

kescobo commented Mar 1, 2024 •

edited

Loading