Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename @linq macro? #177

Closed
CameronBieganek opened this issue Sep 27, 2020 · 43 comments
Closed

Rename @linq macro? #177

CameronBieganek opened this issue Sep 27, 2020 · 43 comments
Milestone

Comments

@CameronBieganek
Copy link

I'm not sure if the current plan is to keep @linq or not, but if it is kept, I think a different name would make more sense. @linq only makes sense if you're a former C# programmer. Perhaps @query instead?

Also, I'd like to argue in favor of keeping @linq, or rather @query. Here's a query from the README that uses Pipe.jl (except I've changed :a to a, etc).

using Pipe
out = @pipe df |>
    @transform(_, y = 10 * x) |>
    @where(_, a .> 2) |>
    @by(_, b, meanX = mean(x), meanY = mean(y)) |>
    @orderby(_, meanX) |>
    @select(_, meanX, meanY, var = b)

What I would prefer is a begin block syntax like this:

using Pipe
out = @query df begin
    transform(y = 10 * x)
    where(a .> 2)
    by(b, meanX = mean(x), meanY = mean(y))
    orderby(meanX)
    select(meanX, meanY, var = b)
end

I think the begin-block syntax is much easier to read. The second example avoids the noise from all the @, _, and |> symbols.

Actually, I already have an open issue for begin-block syntax for the @linq macro: #136

@xiaodaigh
Copy link
Contributor

I actually want to separate it out #173

@CameronBieganek
Copy link
Author

That seems a little excessive. Then we would have DataFramesMetaMeta.jl. 😄

@xiaodaigh
Copy link
Contributor

That seems a little excessive. Then we would have DataFramesMetaMeta.jl.

To me the linq macros is just a distraction. There are too many things to maintain anyway.

@CameronBieganek
Copy link
Author

On the contrary, I think a begin block form for a @query (or @linq) macro should be a pretty fundamental component of a macro package for DataFrames.jl, since it's the easiest to read and the closest to a SQL statement.

@xiaodaigh
Copy link
Contributor

think a begin block form for a

using Lazy: @> # or using DataConvenience

@> df begin
  groupby(:grp)
  combine(:col1 => mean)
end

@CameronBieganek
Copy link
Author

It's true that @> gets us most of the way there. But it has two downsides:

  1. You have to introduce a dependency on Lazy.jl just for a pipe. Most of Lazy.jl is code unrelated to piping.

  2. When combined with the macros from DataFramesMeta.jl, you have to write this:

@> df begin
    @transform(c = a + b)
    @groupby(g)
    @combine(c_mean = mean(c))
end

But I'd rather write this:

@query df begin
    transform(c = a + b)
    groupby(g)
    combine(c_mean = mean(c))
end

The second form would require a custom macro in DataFramesMeta.jl.

@xiaodaigh
Copy link
Contributor

@query df begin
    transform(c = a + b)
    groupby(g)
    combine(c_mean = mean(c))
end

That doesn't compose well with other functions. Also, it's an extra burden on dev which is already stretching very thin. I vote for deprecating @linq. Interested author can revive it in another package.

Also, what's stopping DataFramesMeta.jl from just re-exporting @> or just stripping out the @> code and putting just that into DataFramesMeta.jl?

@CameronBieganek
Copy link
Author

CameronBieganek commented Sep 29, 2020

That doesn't compose well with other functions.

It composes just as well as @>. The semantics I have in mind is that it behaves the same as @>, except it makes the following replacements:

Input Replaced With
transform @transform
select @select
where @where
groupby @groupby
combine @combine

I'm willing to contribute a @query macro like this to DataFramesMeta.jl.

@xiaodaigh
Copy link
Contributor

xiaodaigh commented Sep 29, 2020

I'm willing to contribute a @query macro like this to DataFramesMeta.jl.

I see. That sounds like a good idea.

I also support making the @query macro in another package and DataFramesMeta can just re-export it

@nalimilan
Copy link
Member

Maybe instead of special-casing transform, select, etc., the macro could just apply the same transformations to all functions? I.e. m = mean(x) would become :x => (x -> mean(x)) => :m and so on. Then any function would be supported, and custom functions supporting the pair syntax from packages or defined by users would benefit from the DataFramesMeta syntax.

@pdeffebach latest PR has come quite close to this already AFAICT.

@pdeffebach
Copy link
Collaborator

Maybe instead of special-casing transform, select, etc., the macro could just apply the same transformations to all functions? I.e. m = mean(x) would become :x => (x -> mean(x)) => :m and so on. Then any function would be supported, and custom functions supporting the pair syntax from packages or defined by users would benefit from the DataFramesMeta syntax.

yes this is basically exactly what we do [here](yes, this is

function fun_to_vec(kw::Expr; nolhs = false)

). The current implementation throws an error if an input is not a Symbol or an expression of the form y = f(:x). If we simply uncomment that error, then we will fully support any expression that could be passed to DataFrames.transform.

With regards to re-naming. I would prefer @chain to @query because "query" already has a strong meaning in the data ecosystem. However I am hesitant to simply re-name as the only motivation for a breaking change.

That said, one feature I would like to implement for @linq is an optional underscore currying that would override the default first-argument parsing. I frequently want to use do blocks in pipes and that is currently not possible with @linq.

@linq df |>
    transform(y = f(:x)) |>
   groupby(:a) |>
   combine(_) do sdf
       # complicated stuff
   end |> 
   select(...)

@CameronBieganek I don't think we need a @groupby macro, since there is no special behavior this would have. One of the benefits of @linq currently is that you don't have awkward inconsistencies with which expressions are macros and which are not macros.

An important conflict that as not come up yet here is the tension between x and :x for variable naming. If we decide on x then presumably

@select(df, Between(a, z), y = f(x))

would work. But then, we would likely need some way for groupby(df, g) to work where g is parsed as :g. One method is as @nalimilan descibed where we search through all expressions in the pipe and replace all literals with Symbols. This is possible but would cause lots of headaches and bugs when we mess with expressions we don't "own", so to speak. I don't want to see

@linq df |>
    select(x, y, z = f(x))
    groupby(:g)
    custom_fun(:x, :y)

The mix of :x and x is confusing.

@nalimilan
Copy link
Member

Sorry, I wrote m = mean(x), but that could as well have been m = mean(:x). The debate between x and :x is mostly orthogonal to the question of having @linq (or its new name) apply its transformations to any function AFAICT.

The idea of allowing optional underscores to pipe into an argument other than the first one is interesting.

@pdeffebach
Copy link
Collaborator

I think it's relevant because the main benefit of @linq is consistency and ease of reading, so how we mix :x and x matters in that respect.

As you know, we want to emulate dplyr's ..

@CameronBieganek
Copy link
Author

CameronBieganek commented Sep 29, 2020

I agree that @chain (or @pipe, which is already taken) is a better name for this than @query.


Maybe instead of special-casing transform, select, etc., the macro could just apply the same transformations to all functions? I.e. m = mean(x) would become :x => (x -> mean(x)) => :m and so on.

That would work fairly well, except then we wouldn't be able to add other arbitrary functions to the chain/pipe. For instance, I'd like to be able to write the following:

@chain df begin
    transform(c = a + b)
    where(a < 100)
    tail(30)
    repeat(outer = 3)
end

If we implemented your proposal, then repeat(outer = 3) would be turned into

repeat(..., [] => (() -> 3) => :outer) 

(or something like that), which would cause an error.


Personally, my main desire for a @chain macro is to be able to write something that looks as similar to a SQL query as possible, and thus avoids all the noise from @, _, and |> symbols. However, I do want to be able to chain other functions, so it shouldn't be strictly limited to SQL keywords (i.e. DataFramesMeta macros).


I don't think we need a @groupby macro, since there is no special behavior this would have.

Well, we would need a macro if we want to allow groupby(a) instead of groupby(:a). 🙂


an optional underscore currying that would override the default first-argument parsing

I like this idea.


@linq df |>
   select(x, y, z = f(x))
   groupby(:g)
   custom_fun(:x, :y)

To ameliorate this situation, perhaps @chain could allow DataFramesMeta.jl macros to be written either with or without the @ symbol. Thus you could disambiguate the above like this:

@chain df begin
    @select(x, y, z = f(x))
    @groupby(g)
    custom_fun(:x, :y)
end

@CameronBieganek
Copy link
Author

The new PipelessPipes.jl seems to provide almost everything I want. The only thing it doesn't let me do is write transform instead of @transform. But maybe it's better to be explicit about when you are using a DataFramesMeta macro rather than a DataFrames function. I suppose some people might want to write a pipe that, for instance, uses both DataFramesMeta.@transform and DataFrames.transform in the same pipe.

@pdeffebach
Copy link
Collaborator

I was thinking the same thing.

I really do like writing transform instead of @transform. Ideally there would be no difference between transform and @transform post 1.0 since all arguments taht don't look like y = f(:x) would get passed as-is to transform. So the ambiguity wouldn't matter.

Additionally, I think any @linq replacement should provide both @chain for |> and @chainblock for sequential calls without |>.

But PipelessPipes is promising and maybe they will be able to accept PRs when this task moves higher up on the development priorities list.

@nalimilan
Copy link
Member

Ideally there would be no difference between transform and @transform post 1.0 since all arguments taht don't look like y = f(:x) would get passed as-is to transform. So the ambiguity wouldn't matter.

Things like transform(df, x2 = 2 * :x, :y) can't work outside of a macro so I don't think that's possible.

Additionally, I think any @linq replacement should provide both @chain for |> and @chainblock for sequential calls without |>.

What's the advantage of using |> over what PipelessPipes provides?

@pdeffebach
Copy link
Collaborator

Things like transform(df, x2 = 2 * :x, :y) can't work outside of a macro so I don't think that's possible.

Sorry, I meant that the ambiguity wouldn't be that important inside the chaining block. Not in general. Though in theory a user could, post 1.0, replace every call to transform with a call to @transform in their code and be fine, aside from maybe a @fun macro for a few edge cases.

What's the advantage of using |> over what PipelessPipes provides?

Copying and pasting into the REPL. With |> you can copy and paste half a chain directly to the REPL. Without it, you have to add an end block, which is kind of annoying imo.

@pdeffebach
Copy link
Collaborator

I guess, if we allow users to omit the @, because it requires more context-dependent reasoning, DataFramesMeta should probably own whatever piping code we end up with rather than simply re-export an existing package.

@CameronBieganek
Copy link
Author

CameronBieganek commented Nov 20, 2020

The other reason to support both block piping and |> piping is that some people prefer |> piping, I guess. It's mostly a matter of preference. If they've just come from R they probably want to reach for an explicit pipe operator.

It would be nice if there were a way to overload PipelessPipes.@_ for DataFramesMeta to rewrite transform to @transform, but that's challenging since it's a macro. Would it be possible for DataFramesMeta to shadow an internal PipelessPipes function like PipelessPipes.helper_func? (I just made that function name up.)

@pdeffebach
Copy link
Collaborator

I don't know! maybe. I hardly understand how @linq works now so it would certainly require some effort.

@CameronBieganek
Copy link
Author

If shadowing is not possible, DataFramesMeta could probably monkey patch the PipelessPipes module. 🙈 😂

@CameronBieganek
Copy link
Author

CameronBieganek commented Nov 20, 2020

Perhaps the most reasonable approach would be for PipelessPipes to have a separate macro like @df_ which is the same as @_ except that it also rewrites transform to @transform, where to @where, etc.

@jkrumbiegel
Copy link
Contributor

Why would transform need to be rewritten? Doesn't that also take df as the first arg like all the other functions?

@jkrumbiegel
Copy link
Contributor

Ah I see, because of the additional symbol transformations, I didn't read the previous posts carefully enough

@CameronBieganek
Copy link
Author

@jkrumbiegel To clarify, we would like

@df_ mydata begin
    transform(y = 2 * :x)
end

to be equivalent to

@_ mydata begin
    @transform(y = 2 * :x)
end

@jkrumbiegel
Copy link
Contributor

Aha, well that shouldn't be difficult to do, I can make a mockup macro with that behavior. The functions would be all the DataFramesMeta macros one can use within @linq currently?

@nalimilan
Copy link
Member

What's tricky is that while DataFramesMeta macros should be added the @, one should still be able to call plain functions. So it's probably simpler to define such a macro in DataFrameMeta if we want it.

@jkrumbiegel
Copy link
Contributor

jkrumbiegel commented Nov 22, 2020

Yes I wouldn't build this into @chain, but it would be easy enough to replace only those couple of dataframesmeta functions with their macros in a special variant of the macro. I'm honestly already quite happy with the way the normal functions work with @chain

@pdeffebach
Copy link
Collaborator

Yeah I'm actually a bit bummed @CameronBieganek suggested the renaming for @chain since it means we can't use the name in our own piping function which, as Milan noted, should probably live in DataFramesMeta

@nalimilan
Copy link
Member

OTOH it would make sense for a DataFramesMeta macro to use a more specific name (like @linq currently does).

@CameronBieganek
Copy link
Author

CameronBieganek commented Nov 22, 2020

It seems to me like it would be fine to have the DataFramesMeta-specific macro live in Chain.jl, rather than DataFramesMeta. For the sake of argument, let's call it @dfchain. Then @dfchain would be the same as @chain, except that functions corresponding to DataFramesMeta macros would be rewritten as follows:

Input Output
transform @transform
select @select
combine @combine
orderby @orderby
where @where
by @by
with @with

To be clear, in @dfchain, all other function calls would work the same as in @chain. In other words, foo would not turn into @foo.

I think it's reasonable for this to live in Chain.jl. It does not require introducing a DataFramesMeta.jl dependency for Chain.jl. And then @xiaodaigh would get his wish of separating @linq from DataFramesMeta.

Other piping packages could do something similar if they wished. For example, Pipe.jl could add a macro called @dfpipe.

Overall it seems like a benefit to remove all piping functionality from DataFramesMeta.jl and rely solely on piping packages like Chain.jl and Pipe.jl.

@nalimilan
Copy link
Member

What would be the advantage? That macro wouldn't make any sense without DataFramesMeta, and versions would have to be kept in sync with DataFramesMeta if we add new macros. Moving things to other packages doesn't reduce the maintainance burden -- it increases it as then you need to handle interactions between packages and their version bounds.

@jkrumbiegel
Copy link
Contributor

I'd say given that the Chain.jl code is like 70 lines, the easiest thing would be to just copy it and make your edits in DataFramesMeta. It's not like there are going to be big changes you'll have to merge in later. The package is pretty much done and I can't see what should change about it

@CameronBieganek
Copy link
Author

CameronBieganek commented Nov 22, 2020

Ok @nalimilan, if you really don't want to have @dfchain live in Chain.jl, how about a pattern like the following?

module Chain
    function_to_macro(ex) = ex
    # other stuff
end
module DataFramesMeta
    using Chain
    Chain.function_to_macro(ex) = # rewrite `transform` to `@transform`, etc
    # other stuff
    export @chain # re-export the overridden @chain 
end

I don't know if that sort of pattern would be considered admissible, but I think it would work.

@pdeffebach
Copy link
Collaborator

I don't even think it needs to be in a separate module. The current implementation of @linq is fine, it just lacks a few features and could maybe be re-written in a more clear way, and maybe have the name changed. that's all.

@CameronBieganek
Copy link
Author

Well, I didn't mean anything particular about using modules. I just meant that Chain.jl could define an internal function called function_to_macro which is the identity function within Chain.jl. But then DataFramesMeta could override that function to transform function calls to macro calls (just for transform, select, etc).

But I guess that would be a monkey patch, since it would be overwriting an existing method rather than adding a new method.

🤷‍♂️

@CameronBieganek
Copy link
Author

FWIW, if you do want to keep the DataFramesMeta piping macro as a completely independent implementation, I think it would be ok to call it @chain.

@pdeffebach
Copy link
Collaborator

@jkrumbiegel I didnt realize until now that you were the owner of Chain.jl!

I think the best course of action is to wait a little bit and see how Chain.jl matures. In the mean-time, there other priorities. I encourage people to dog-food and test Chain.jl and after x vs :x, keyword arguments, pair inputs are sorted we can see if everyone still agrees on the implementation.

I seem to be the only person who likes |> so I am okay to make the default chainer block-based.

@jkrumbiegel
Copy link
Contributor

jkrumbiegel commented Jan 18, 2021

I just played a little with this again and made a @dfchain macro which rewrites @transform => transform etc. It's very simple:

using DataFrames, Chain, DataFramesMeta

macro dfchain(first, block)
    dataframesmeta_symbols = [:transform, :combine, :select, :where, :orderby]
    b = copy(block)
    last_linenumbernode = LineNumberNode(0)
    for line in b.args
        if line isa LineNumberNode
            last_linenumbernode = line
        elseif line isa Expr && line.head == :call
            symbol = line.args[1]
            symbol isa Symbol && symbol in dataframesmeta_symbols || continue
            macrosymbol = Symbol("@" * String(symbol))
            line.head = :macrocall
            line.args[1] = macrosymbol
            insert!(line.args, 2, last_linenumbernode)
        end
    end
    esc(quote
        @chain $first $b
    end)
end

You can use it like this:

df = DataFrame(id = rand(1:100, 1000), weight = randn(1000), name = [String(rand('a':'z', 5)) for _ in 1:1000])

@dfchain df begin
    where(:id .> 50, .!startswith.(:name, "a"))
    groupby(:id)
    combine(x = sum(:weight))
    orderby(-:id)
end

Just thought this was a good place to save it while we're thinking about the options for DataFramesMeta.

Interestingly, it might point again to :col being a good choice for DataFramesMeta because it mixes more naturally with non-DataFramesMeta expressions.

@pdeffebach
Copy link
Collaborator

Okay I think we should move forward and deprecate @linq.

Now that there is DataFrames.transform and DataFramesMeta.@transform, but with different behaviors, we can't "pub" transform inside a @linq block.

DataFrames.transform is too flexible and powerful compared to DataFrames.@transform to make it second class, requiring DataFrames.transform qualifier.

I propose to add Chain.jl as a dependency and deprecate @linq before 1.0. (while maintaining the deprecation for a long time).

@pdeffebach pdeffebach added this to the Decision 1.0 milestone Mar 7, 2021
@bkamins
Copy link
Member

bkamins commented Mar 7, 2021

I am OK, we would just need to resolve jkrumbiegel/Chain.jl#19 (but I hope it will be resolved soon)

@pdeffebach
Copy link
Collaborator

Closing, since @chain is preferred to @linq now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants