Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip set.seed() if NA_integer_ is given. #53

Closed
wants to merge 4 commits into from

Conversation

heavywatal
Copy link

@heavywatal heavywatal commented Jun 10, 2024

This should address #52 but currently it does not work because NA_integer_ is implemented as the maximum negative value, and a negative seed causes an error "negative length vectors are not allowed". base::set.seed() accepts negative seeds without any problem, which indicates ClusterR is doing something wrong with negative seeds.

This should address mlampros#52 but currently it does not work because
`NA_integer_` is implemented as the maximum negative value, and
a negative seed causes an error "negative length vectors are not allowed".
`base::set.seed()` accepts negative seeds without any problem,
which indicates ClusterR is doing something wrong with negative seeds.
@heavywatal
Copy link
Author

Sorry, my bad. I was doing wrong on initialization of Rcpp::IntegerVector. Now it accepts NA_integer_ as expected.

@heavywatal
Copy link
Author

With this modification, it becomes possible to do this:

set.seed(1L)
KMeans_arma(iris[,-5], 3L, seed = NA_integer_)

which generates exactly the same result as this:

KMeans_arma(iris[,-5], 3L, seed = 1L)

It is also possible to do the same thing in Rcpp level.

@mlampros
Copy link
Owner

@heavywatal

Thanks for the pull request and the additional functionality. A few requests from my side:

  • Could you please add tests for the functions that are affected from the modifications (all functions that use the 'seed' parameter). Besides adding the tests that do not throw an error please also add tests that show that the output of the current default value is not affectd
  • I don't think that your adjustments on the RcppArmadillo side will have any ASAN or UBSAN issues but I have to test it before I submit to CRAN because it is something that takes me quite long to test. Are you familiar with ASAN, UBSAN testing of Rcpp code? There is a new version of rhub but I normally use my docker image . Feel free to test it and include the results because I don't have any workflow for ASAN, UBSAN as a Github action

@mlampros mlampros self-assigned this Jun 11, 2024
@heavywatal
Copy link
Author

OK, for now, I have added the minimal set of tests to validate the modification.

  • Do you really think it is necessary to repeat this for every function with seed= parameter?
  • Sorry I have never heard of ASAN or UBSAN.

@mlampros
Copy link
Owner

Do you really think it is necessary to repeat this for every function with seed= parameter?

We have to make sure the previous version of the ClusterR and the updated version using your adjustments in the code do not give different results. Therefore, we have to test the functions which have as an argument the "seed"

Sorry I have never heard of ASAN or UBSAN

If you use compiled code in an R package then the code is tested for "Undefined Behaviour". Your adjustment which is the following line

if (Rcpp::all(Rcpp::is_na(Rcpp::IntegerVector(seed)))) return;

Will take an NA (missing value) as input and this might lead to undefined behaviour (or might not - I don't know that, because I don't know the internals of the Rcpp package).

I'll tell you from personal experience that I spent many hours in the past to fix errors from CRAN related to ASAN, UBSAN and avoid the exclusion of the package from CRAN.

@heavywatal
Copy link
Author

We have to make sure the previous version of the ClusterR and the updated version using your adjustments in the code do not give different results.

I appreciate the purpose, but partially disagree on the conclusion:

Therefore, we have to test the functions which have as an argument the "seed"

because it should have been logically proven already by the combination of the existing tests and my additional tests. Is there any ClusterR function that uses seed in a different way from KMeans_arma()? Now that KMeans_arma() passes the tests, is it possible for KMeans_rcpp(), for example, to fail in terms of set_seed() modification? It seems to me that repetitive tests will just increase the code complexity uneconomically without much benefit.

But, OK, you are the author of this package, and I will add some tests for the other functions in the near future.

I'll tell you from personal experience that I spent many hours in the past to fix errors from CRAN related to ASAN, UBSAN and avoid the exclusion of the package from CRAN.

Thank you for the effort. You saved my day(s). Is it possible that we skip our local ASAN/UBSAN tests and give it a shot to let CRAN do it (hopefully automated)?

@mlampros
Copy link
Owner

mlampros commented Jun 11, 2024

Thank you for the effort. You saved my day(s)

By "spent time" I meant to debug previous error cases related to ASAN, UBSAN and not related to your current additions in the .cpp files. This is something you have to test/check using the existing packages such as rhub or any packages that you think it can serve the purpose.

Is it possible that we skip our local ASAN/UBSAN tests and give it a shot to let CRAN do it (hopefully automated)?

I'm sorry I can not do that. If you are not currently in place to perform the test/check then feel free to close the PR and re-open or create a new one once you are ready. thanks

@mlampros
Copy link
Owner

I'll close the PR for now. Feel free to re-open or open a new one.

@mlampros mlampros closed this Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants