-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge column-wise and row-wise macros #29
Comments
I agree something like this would really help simplifying the API (no need for two versions of each macro). |
The "change to colwise scope" is a very nice solution as I don't have to double the notation. @where iris :SepalLength > $(mean(:SepalLength)) makes it clear that I'm using one or more columns to compute a scalar. The only thing that this is not covering is the so called "window functions" (functions that take a column of length @where_vec iris :SepalLength .- lag(:SepalLength) .> 1 ? The only thing that comes to mind is that if the expression inside the dollar evaluates to a vector, than I should iterate on it, but I'm not fully convinced. Do you have some suggestions for this as well? I may have to check how dplyr handles this. |
Good point. Maybe iterating is a good rule. More precisely, you could broadcast operations, as if the full expression was wrapped in dplyr's window functions are documented here: https://dplyr.tidyverse.org/articles/window-functions.html AFAICT, everything is vectorized there, but in R there's no difference between |
The similarity with julia> t = table((a = 1:10, b = rand(10)))
Table with 10 rows, 2 columns:
a b
────────────
1 0.515873
2 0.930648
3 0.402888
4 0.801836
5 0.600595
6 0.801115
7 0.774909
8 0.731416
9 0.572505
10 0.371466
julia> f(row) = row.a*row.b
f (generic function with 2 methods)
julia> f.(t)
10-element Array{Float64,1}:
0.5158734752863601
1.8612967088054502
1.2086637555023616
3.2073426685165565
3.0029751634912216
4.80669118912654
5.424360649636144
5.851328616870312
5.1525439684036645
3.714664066202098 In terms of implementations, I'm still a bit confused. I almost want broadcast but there are two impediments:
Is there a simple way to get the iterator that would result from broadcasting without collecting it? In terms of meaning however, I'm not sure that I would want to iterate over something other than a vector (and I definitely do not want to get errors from the broadcasting machinery if things return a custom struct for example), so I'm still not sure whether the broadcasting API is a better rule than "iterate if julia> df = DataFrame(x = 1:10);
julia> df.y = 3;
|
I don't really meant
I think that's the point of Currently |
@nalimilan had a beautiful suggestion here: JuliaData/DataFrames.jl#1514 (comment).
There may actually be very little need to have separate row-wise and column-wise macros. The row-wise macro could simply also accepts columns (as regular vectors) with a different syntax.
For example, now if we need to filter values for which
:SepalLength
is greater than5
in the datasetiris
we'd do:Whereas if we need to compare with something that require the all column, we'd need to switch to
@where_vec
and add a.
for broadcasting:The idea would be to find a syntax so that we'd only use the row-wise macro but find a way to refer to columns (at macro expand time the symbol is replaced with the corresponding column):
This would be mostly non-breaking but at the same time would make column-wise macros redundant.
I like the idea a lot but am unsure about the syntax. As of now in row-wise macros
_
refers to the row, symbols refer to fields andcols(c)
can be used to instruct the macro thatc
is a variable that evaluates to a symbol, so should be replaced with the field (consistent with DataFramesMeta and StatPlots). In column wise macros_
refers to the table and symbols correspond to columns, andcols(c)
has the corresponding role.What would be an extra syntax to use in row macros?
Candidates:
$SepalLength
col(:SepalLength)
but could be to confusing givencols
_I_.SepalLength
where_I_
would be replaced by a table like object with dot overloading to extract columns? It does look a bit ugly though.In the first to cases I'm also a bit confused how one would do if the column is passed programmatically (by, say,
c=:SepalLength
)The text was updated successfully, but these errors were encountered: