-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tables that are tuples #276
Comments
How do you propose to resolve it? In general, under current rules both are valid. Some data structure, conceptually, can be both a table and a collection of tables. What I mean is that you can construct an object that declares itself to be a table, but at the same time it can store tables inside - there is no contradiction. In general e.g. a vector of
In this particular case the root cause of the problem is this definition I think:
which special cases So, essentially - what you propose is to make One more comment about
That is. Even if |
@bkamins Thanks indeed for your thoughts. I didn't quite understand this comment:
The object you mention is not a tuple. And this example does not cause problems in the cases I mentioned. I don't have a concrete proposal, unless declaring all unnamed tuples to be non-tables is realistic option. In the first issue, the case is made that tuples in Julia have a very special meaning, as they are the grouping type for the return value of "multi-valued" functions. There was a feeling that implementing Tables for Base tuple types is therefore a kind of piracy, I guess. They are chaining multiple transformers and want the option of processing multiple data containers at once. Here "multiple" is detected using tuple. An alternative is the use of varargs, but this breaks the nice (and natural) pipeline syntax they are using. It would be nice to know if there are (unnamed) tuples, that are also tables, in common use. |
That is a point of my comment. Examples:
In summary - type of the container currently does not matter for Tables.jl. Only the type of elements stored in it matter. Given your comments maybe you want to propose another rule:
This could be acceptable, as it is not very likely that someone stores tables in tuples. Is introduction of such a rule something you would propose then? |
@bkamins Thanks for clarifying the comment!
I believe this would indeed resolve both the issues mentioned. Thanks for considering it! @darsnack Do you agree this is a worthwhile proposal from the point-of-view of JuliaML/MLUtils.jl#61 ? |
It seems problematic to decide that tuples should be special-cased and never be considered as tables. What would be the rationale for this in Tables.jl? AFAICT this is a problem at JuliaAI/ScientificTypes.jl#182 because ScientificTypes treats tuples in a special way, but another package could have a similar request about vectors (and indeed it's not uncommon to have ambiguities between tables and other types such as vectors). Why not just add that restriction in ScientificTypes only? |
@nalimilan Thanks for looking into this.
The rational is I don't see how either issue can be resolved outside of a Tables.jl restriction, as unnatural as that might seem from the point-of-view of the package considered in isolation. How do you mean "add restriction" in ScientificTypes, exactly? We can add something to documentation, but any user trying to to get ScientficTypes to play with tuple-tables will get unexpected behaviour. In ScientificTypes a basic and very natural requirement is that the scitype of a tuple is the What would be your proposal to resolve the issue in MLUtils.jl? |
I just mean that MLUtils could disallow using tuples as tables, i.e. throwing an error when getting one or treating them as non-table objects. This is basically the same thing as you propose doing in Tables.jl, but rather than disallowing this everywhere, only do so in MLUtils (where they are problematic). |
@nalimilan Thanks. I will suggest, then, that those two packages treat all tuples as non-tabular objects, and document the fact. |
@ablaom - I think (I am not sure - just guessing) that what @nalimilan wants to say is that in some contexts
Then if you have such a table and want to perform some manipulations on it the output is also
Of course this is probably not a realistic scenario for ML applications but it is plausible for some transactional system that needs to response to queries in near real time (and wants to e.g. avoid GC hiccups). Having said that, maybe there is a work-around for this use case that does not require |
@quinnj - how do you see this issue? |
I have encountered two thorny issues which have arisen because Tables.jl natively implements its interface for certain kinds of tuples. The conflicts arising are not quite the same, which suggests that even if workarounds are found, the underlying problem may reappear elsewhere, so I thought it worth inviting comment here. One issue is in a package I maintain but the other is not. The issues are here and here.
In Tables.jl the following is both a table and a tuple of tables:
This leads to ambiguity in the issues mentioned.
@darsnack @OkonSamuel
The text was updated successfully, but these errors were encountered: