Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rstan fails to read cmdstanr output .csv files #1133

Open
santikka opened this issue Jul 10, 2024 · 16 comments
Open

rstan fails to read cmdstanr output .csv files #1133

santikka opened this issue Jul 10, 2024 · 16 comments

Comments

@santikka
Copy link

Summary:

rstan::read_stan_csv() no longer works for for .csv files produced by cmdstanr

Description:

It seems that rstan expects the save_warmup field to be an integer value, but cmdstanr instead writes this as a logical value (e.g., false) to the conversion to integer fails in rstan:::parse_stancsv_comments() resulting in an NA value for save_warmup. This used to work with earlier versions of cmdstanr/CmdStan.

Reproducible Steps:

library("cmdstanr")
#> This is cmdstanr version 0.8.1
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: C:/Users/Santtu/.cmdstan/cmdstan-2.35.0
#> - CmdStan version: 2.35.0
file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(file)
data_list <- list(N = 10, y = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1))
samples <- mod$sample(
  data = data_list,
  refresh = 0,
  chains = 1,
  iter_sampling = 10,
  iter_warmup = 10
)
#> Running MCMC with 1 chain...
#> 
#> Chain 1 WARNING: No variance estimation is 
#> Chain 1          performed for num_warmup < 20 
#> Chain 1 finished in 0.0 seconds.
rstan::read_stan_csv(samples$output_files())
#> Warning in parse_stancsv_comments(comments): NAs introduced by coercion
#> Error in if (max(save_warmup) == 0L) {: missing value where TRUE/FALSE needed

Created on 2024-07-10 with reprex v2.1.1

RStan Version:

2.32.6

R Version:

R version 4.4.0 (2024-04-24 ucrt)

Operating System:

Windows 11 Pro 23H2

@martinmodrak
Copy link
Contributor

martinmodrak commented Jul 30, 2024

I can reproduce the issue - outputs from CmdStan 2.34.1 are read fine, outputs from 2.35.0 cause problems.

Although I get a different error message (maybe because of using R 4.3.2)

Error in rstan::read_stan_csv(samples$output_files()) : 
  object 'n_kept' not found

I'll also note that the problem is made worse by the code at:

if (max(save_warmup) == 0L) { # all equal to 0L
where if save_warmup is not all 0 or all 1, n_kept never gets assigned.

@jgabry
Copy link
Member

jgabry commented Jul 30, 2024

Thanks. I think this might be fixed by #1131. Or at least the save_warmup issue, not sure about the n_kept issue @martinmodrak ran into.

@martinmodrak
Copy link
Contributor

@jgabry Yes, this doesn't resolve the problamatic logic around n_kept. Also #1131 seems likely to break the old .CSVs, though. I didn't do a full checkout, but notice:

as.integer(as.logical("0"))
# [1] NA
as.integer(as.logical("false"))
# [1] 0

@jgabry
Copy link
Member

jgabry commented Jul 31, 2024

Also #1131 seems likely to break the old .CSVs, though.

Ah good point. Perhaps we should revert #1131 and figure out a more general approach? @bgoodri @andrjohns we probably need to decide whether we want to keep supporting this functionality in RStan. If there continue to be changes in the CSV files created by CmdStan I don't think RStan can keep up with releases in order to make sure read_stan_csv keeps working. Do we even need it anymore given that CmdStanR exists?

@andrjohns
Copy link
Contributor

The approach I took for cmdstanr for compatibility with both was to convert any true/false to 0/1, so that the parsing logic for the rest stayed the same

@martinmodrak
Copy link
Contributor

martinmodrak commented Jul 31, 2024 via email

@andrjohns
Copy link
Contributor

Note that using cmdstanr with brms relies on read_stan_csv.

It doesn't look like that's the case anymore (which makes sense, otherwise there would have been a ton of bug reports in brms once 2.35 was released)

@jgabry
Copy link
Member

jgabry commented Jul 31, 2024

otherwise there would have been a ton of bug reports in brms once 2.35 was released

Yeah that would have been a mess!

@katrinabrock
Copy link

I was getting the if (max(save_warmup) == 0L) error and installed the latest version of rstan (from r-universe) and I'm now getting object 'n_kept' not found error.

Is there a magical version I can use where I won't get either of these errors? Or is there a way I need to modify the input data to make it work?

I got the code+data combination I'm attempting from someone else and it was working at one point. I'm not sure what package version were used then. :-/

@jgabry
Copy link
Member

jgabry commented Nov 8, 2024

Sorry for the hassle! I'm honestly not sure. Originally rstan::read_stan_csv only existed because there was no such thing as cmdstanr. So users needed a way to import CmdStan output into R. But now that cmdstanr has been around for years, it hasn't been a priority to keep the CmdStan csv files (which change sometimes) backwards compatible with rstan::read_stan_csv nor has there been enough bandwidth among developers to keep rstan::read_stan_csv up to date with changes to CmdStan.

Is this for a model fit with cmdstanr? Here's some code from @andrjohns to produce a named list of arrays of posterior samples, similar to what rstan would do:

https://discourse.mc-stan.org/t/rstan-read-stan-csv-throwing-error-with-cmdstan-models-versions-2-35/35665/7

That will hopefully allow you to avoid rstan::read_stan_csv entirely.

@jgabry
Copy link
Member

jgabry commented Nov 8, 2024

@bgoodri In the next RStan release read_stan_csv should probably be formally deprecated unless the plan is to keep updating it when CmdStan csv files change, but I don't think that's been the plan so far.

@jgabry
Copy link
Member

jgabry commented Nov 8, 2024

@katrinabrock in addition to my previous comment, another option could be to run an older version of cmdstan. I think it might just be as of 2.35.0 that this is an issue, so you could try installing e.g. 2.34.0. cmdstanr::install_cmdstan() will let you specify version = "2.34.0".

@katrinabrock
Copy link

@jgabry

Thanks for the quick response! switching cmdstanr to version 2.34.0 did not appear to resolve the issue.
I'm a bit puzzled because the code I'm working with does not appear to be using cmdstanr. It appears to be using rstan only and passing an r object in as data. I will dig a bit more an post back if I discover anything useful.

@jgabry
Copy link
Member

jgabry commented Nov 9, 2024

Yeah changing the cmdstanr version would only work if the csv files were being produced by cmdstanr. It’s weird if this is only using rstan because read_stan_csv() shouldn’t be necessary to use at all in that case. So yeah, quite strange!

@katrinabrock
Copy link

katrinabrock commented Nov 10, 2024

ok! I did some more digging and discovered a couple mistakes I made in my initial debugging: 1) I initially thought the script I was working on was calling rstan::stan when it was in fact calling rethinking::stan (https://github.com/rmcelreath/rethinking/blob/master/R/cmdstan_support.r#L104). 2) I switched my cmdstan versions, but forgot that cmdstan version mismatch will not automatically trigger recompilation of the model. Therefore, I was still running my code with models produced with 2.35.0 so of course this did not do anything to resolve the issue.

With this in mind, here is my advice to others (@santikka ) to workaround this issue:

Workarounds

Workarounds specific to rethinking package

For those hitting this issue from using the rethinking package (traceback includes this line https://github.com/rmcelreath/rethinking/blob/master/R/cmdstan_support.r#L97). This is happening because rethinking is trying to convert the cmdstanr output into a rstan output which is no longer supporter (per @jgabry above).

To resolve this, either

  1. add rstan_out=FALSE to your rethinking::stan call which will result in the call returning a cmdstanr output instead of attempting to convert

OR

switch to set_ulam_cmdstan(FALSE) which will take cmdstanr out of the picture, use pure rstan and thus the result format will be unchanged

OR

use the general solution below

General workaround

Use cmdstan 3.34.0 or earlier.

Steps:
Step 1) (one time per machine) Install earlier version of cmdstanr. cmdstanr::install_cmdstan(version = "2.34.0").

Step 2) (every time run your legacy code) Activate the older version of cmdstan like cmdstanr::set_cmdstan_path(file.path(dirname(cmdstanr::cmdstan_path()), 'cmdstan-2.34.0')). (Should work cross platform, but only tested on Linux.). You can check that this worked by running cmdstanr::cmdstan_version(). Result should be "2.34.0". It would make sense to put this cmdstanr::set_cmdstan_path(file... at the top of any script where you're having this issue.

Step 3) (one time per model) Recompile any models that you compiled with an incompatible version of cmdstanr. There are a few ways you can do this:

  • Option A) Delete the exe file. This file will usually be in the same directory as your stan code. e.g. if you have a my-model.stan file, you will see another file called my-model in the same directory. When you delete this and re-run your model, it will automatically be re-created with the version of cmdstan that is currently activated.
  • Option B) If you are calling cmdstanr directly, add force_recompile = TRUE to either your cmdstanr::cmdstan_model(...) call or your $compile call.
  • Option C) Run options("cmdstanr_force_recompile" = TRUE) before running your model. This works even if you are not calling cmdstanr directly, but via some other library such as rethinking.

Note: With Option B or C, once you've done all this recompiling once for each model with version 2.34.0, you probably don't need to recompile every single time you run your code. To go back to not recompiling by default, remove force_recompile = TRUE from your arguments or run options("cmdstanr_force_recompile" = FALSE).

Real Solution

Re-write your code to use only cmdstanr since it's better supported than rstan and much better supported than a hybrid.

@jgabry
Copy link
Member

jgabry commented Nov 11, 2024

Thanks so much for the detailed report! Hopefully this is useful to anyone else running into the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants