-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two regex-related items on a wish list #1849
Comments
Currently the way to write this is
I think I get the idea what you want but to be sure could you give an example of proposed Julia target code (as you did in point 1) and expected output? Thank you! |
Actually regarding 1, if we decided to introduce
Also note that the subtlety is that sometimes several selectors could match the same column names and we would have to decide what to do in this case (throw an error or deduplicate). |
Why not allow tuples/vectors mixing regexes and other selectors, if that doesn't make the code too complex. Only trying an implementation will tell... |
The initial reason was the following:
It is doable - simply this will add tons of technological debt. Therefore if we move in this direction a careful redesign of internal API is required. I have been thinking about if for some time now. The initial idea is that all public API functions that take columns should have three versions only the following types of
Then any such change as proposed here can be implemented only in one place - namely This brings us to the question I have been raising for some time now - do we want to expose If what I propose here is OK with you then in #1847 I will redesign this following what I propose above. Please let me know. |
And then when #1847 is done adding the support you ask for here should be a small PR. |
Makes sense, but probably better not spend too much time on internal redesigning before 1.0. Features like this issue can wait for 1.x. |
For the second point, I am thinking of the following syntax, which admittedly doesn't work because of broadcasting.
In this case
I think this is a great idea. Every time we make a new feature, I am likely to say "well, it would be great if this worked with an array of tons of different kinds of inputs". I think an infrastructure for taking
in scalar, tuple, or vector form, and then having DataFrames take them and put it into a nice array would be a good investment. |
Yeah, that's a more general problem we have with column selectors (see comments around #1256 (comment)). We should probably add a wrapper type to be used like |
In long term this would be solved by the |
@pdeffebach - could you please summarize what you think is left from your proposal to be implemented given the current API we have (note that |
I think the
you have to specify |
OK - so I close this as we have a separate issue for broadcasting of |
I really like the new regular expressions indexing! We are really getting feature parity with R because of this.
However I want to look beyond R and focus on two Stata features where regex is super useful.
Mixing regex and symbols in select.
Right now you can't do
or
This is a pattern that is really useful in Stata when you want, for example, a bunch of nicely titled variables you just made and an ID variable that you use for joining.
Collapse based regular expressions. Inspired by this tweet.
In Stata you can do the following super nicely:
Obviously with Add more splatting and appending of arguments in by #1620 you can make the vectors of symbols first and use that. But you could do that before with the
getindex
but we added a convenience method for regular expressions. It's worth thinking about adding them here.Let me know if this is better suited for DataFramesMeta as well.
The text was updated successfully, but these errors were encountered: