Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should variables() behave differently for draws_rvars format? #208

Closed
jgabry opened this issue Nov 1, 2021 · 4 comments · Fixed by #342
Closed

should variables() behave differently for draws_rvars format? #208

jgabry opened this issue Nov 1, 2021 · 4 comments · Fixed by #342
Labels
documentation Improvements or additions to documentation

Comments

@jgabry
Copy link
Member

jgabry commented Nov 1, 2021

When working on stan-dev/cmdstanr#582 I was reminded that variables() and nvariables() behave differently for the draws_rvars format compared to all other formats. I don't think this difference (treating vector/matrix variables as a single variable vs. treating elements as separate variables) is documented anywhere (or maybe I missed it). Should I add this information to the help("draws-index") page where these functions are documented?

@jgabry jgabry added the documentation Improvements or additions to documentation label Nov 1, 2021
@mjskay
Copy link
Collaborator

mjskay commented Nov 1, 2021

Yeah, thanks that would be helpful! I think it's mentioned in the vignette but probably not elsewhere.

Alternatively, we could discuss if we want to keep that difference? It may be that we want to be able to get variable names and counts counted either way (by the base variable name or by the combination of variable and indices) when using any format, in which case we might want an additional function (or option) to do that. In hindsight while the idiosyncratic version of variables() makes sense for draws_rvars, it could be problematic in some cases, e.g. if someone was using the variables listed by variables() with extract_variable(). Kinda breaks the abstraction a bit...

@jgabry jgabry changed the title document that variables() behaves differently for draws_rvars format should variables() behave differently for draws_rvars format? Nov 2, 2021
@jgabry
Copy link
Member Author

jgabry commented Nov 2, 2021

I think it's mentioned in the vignette but probably not elsewhere.

I forgot about the vignette. You're right, it does show this behavior there. I can also add something to the roxygen doc ( I guess the doc will depend on what we decide about adding a new function or argument, like you mentioned).

Alternatively, we could discuss if we want to keep that difference? It may be that we want to be able to get variable names and counts counted either way (by the base variable name or by the combination of variable and indices) when using any format, in which case we might want an additional function (or option) to do that

I like the idea of being able to get this info for any format. I just changed the name of this issue from just about the doc to discussing this question. I'd be ok with either a separate function or just an argument to variables() that toggles returning the names of all elements (although I guess this would need a different default depending on the format in order to not break backwards compatibility).

it could be problematic in some cases

Yeah I just ran into something in CmdStanR where we have some code that mistakenly assumes that variables() will return the same thing regardless of the draws format. variables() and nvariables() being different does make it trickier to work with draws objects because it requires special code just for draws_rvars.

@mjskay
Copy link
Collaborator

mjskay commented Nov 2, 2021

Yeah I just ran into something in CmdStanR where we have some code that mistakenly assumes that variables() will return the same thing regardless of the draws format. variables() and nvariables() being different does make it trickier to work with draws objects because it requires special code just for draws_rvars.

Makes sense. I think a default of giving the variable names with indices is probably sensible for compatibility with other parts of the interface.

If we had an option for giving only the "base" name, what is a sensible name? variables(x, base = TRUE)? stem = TRUE? array = TRUE? dimensions = FALSE? dims = FALSE? indices = FALSE? with_dims = FALSE?

One potential complication is that variables<-() for draws_rvars in the default case will be a bit hairy. It would maybe have to only allow renaming scalars, or possibly try to detect if someone is doing a valid rename of an array (e.g. if they rename all of c("x[1]", "x[2]", "x[3]") to c("y[1]", "y[2]", "y[3]") simultaneously). So there may still be some inconsistencies in the interface, but we might be able to keep them to a minimum.

I'd be curious what @paul-buerkner thinks.

@paul-buerkner
Copy link
Collaborator

I agree that, ideally, variables() should work the same way for all formats so changing the default behavior for rvars and adding a "base" argument sounds sensible to me. I don't have a good intuition about the name of this additional argument but "base" feels good in my ears.

I don't have a good intiution about variables<-() but we could also add the base argument there to do variables(x, base = TRUE) <- (). We could make base=TRUE the default here (slight inconsistency that I think is ok) and for base = FALSE do the input checks as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants