Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #431: Make drop_redundant_dims safe for data.table #434

Merged
merged 3 commits into from
May 13, 2020

Conversation

danschrage
Copy link
Contributor

Modify drop_redundant_dims to avoid an error when data is a data.table, as reported in #431. This happens because drop_redundant_dims indexes into a data.frame using a vector of logicals, but data.table expects an unquoted name to be a column name. To fix this, my code skips the dimension-reduction step if data is a data.table.

This shouldn't cause problems with data.table: In general, dimension reduction like this isn't necessary for a data.table, because data.table makes it really difficult to add a matrix as a column. The only way to do this is to explicitly coerce an existing data.frame that has matrices as columns using setDT(), and that will warn the user against doing this (see Rdatatable/data.table#3851). For any other case, data.table will automatically coerce matrices into columns. In other words, data.table does this dimension reduction automatically, so it can be safely skipped.

@mespe
Copy link

mespe commented May 4, 2020

I am not sure this is the best solution for this issue. Specifically, I worry it is not robust because it relies on implicit behavior of the data.table package. It would be more robust to either issue a more informative error and let the user coerce the input data, or force coercion to a data.frame.

Additionally, it would be nice if this PR included an associated test so if/when data.table changes its behavior, it will flag this as needing attention.

danschrage added 2 commits May 4, 2020 09:49
Instead of checking for data.table in `drop_redundant_dims`, follow the behavior of `validate_newdata` and coerce everything to a data.frame in `validate_data` before calling `drop_redundant_dims`.
Make drop_redundant_dims safe for data.table
@danschrage
Copy link
Contributor Author

Thanks for the comments! I updated the pull request, and now validate_data behaves identically to validate_newdata and simply coerces anything (data.table, tibble, etc.) into a data.frame before calling drop_redundant_dims. So it should no longer depend on or be disrupted by any future changes in data.table. That should eliminate the need for an associated test, too.

I tested it on your MWE from #431 as well as on my own data.

@jgabry
Copy link
Member

jgabry commented May 13, 2020

@danschrage and @mespe, thanks for sorting this out!

@jgabry jgabry merged commit 2ab134f into stan-dev:master May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants