Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New / remaining sanitizer issues #56

Closed
eddelbuettel opened this issue Feb 27, 2020 · 17 comments
Closed

New / remaining sanitizer issues #56

eddelbuettel opened this issue Feb 27, 2020 · 17 comments

Comments

@eddelbuettel
Copy link
Owner

Per email from Uwe:

Brian found clang-SAN  still gives

/data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/internal/caster.h:30:25: 
runtime error: -1 is outside the range of representable values of type 
'unsigned long'
     #0 0x7f33c92f9c53 in unsigned long Rcpp::internal::caster<double, 
unsigned long>(double) 
/data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/internal/caster.h:30:25
     #1 0x7f33c92f9c53 in unsigned long 
Rcpp::internal::primitive_as<unsigned long>(SEXPREC*) 
/data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/as.h:39:21
     #2 0x7f33c9331a47 in unsigned long Rcpp::internal::as<unsigned 
long>(SEXPREC*, Rcpp::traits::r_type_primitive_tag) 
/data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/as.h:44:20
     #3 0x7f33c9331a47 in unsigned long Rcpp::as<unsigned 
long>(SEXPREC*) 
/data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/as.h:152:16
     #4 0x7f33c9331a47 in Rcpp::InputParameter<unsigned long>::operator 
unsigned long() 
/data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/InputParameter.h:34:38
     #5 0x7f33c9331a47 in Rcpp::CppMethod4<Annoy<int, float, Euclidean, 
Kiss64Random>, Rcpp::Vector<19, Rcpp::PreserveStorage>, int, unsigned 
long, unsigned long, bool>::operator()(Annoy<int, float, Euclidean, 
Kiss64Random>*, SEXPREC**) 
/data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/module/Module_generated_CppMethod.h:375:78
     #6 0x7f33c930c52a in Rcpp::class_<Annoy<int, float, Euclidean, 
Kiss64Random> >::invoke_notvoid(SEXPREC*, SEXPREC*, SEXPREC**, int) 
/data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/module/class.h:234:23
     #7 0x7f33c95b8489 in CppMethod__invoke_notvoid(SEXPREC*) 
/tmp/RtmpjzdNeY/R.INSTALL8afa7606c1f3/Rcpp/src/module.cpp:220:19
@erikbern
Copy link

odd. do you know what line in annoy this is caused on

@jlmelville
Copy link
Contributor

FWIW, this is the same UBSAN pattern that was triggered by the last version's clang-UBSAN checks
(currently at https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/RcppAnnoy/RcppAnnoy-Ex.Rout) and is triggered by:

a$getNNsByItemList(0, 5, -1, TRUE)

which in turn calls:

Rcpp::List getNNsByItemList(S item, size_t n,
                                size_t search_k, bool include_distances) {
// ... //
    std::vector<S> result;
    std::vector<T> distances;
    ptr->get_nns_by_item(item, n, search_k, &result, &distances);
}

but from the look of the message it may not be complaining about anything in Annoy here: I think it might be that search_k = -1, but is being cast to a size_t (aka an unsigned long on this platform)?

@erikbern
Copy link

ok doesn't look like anything in core annoy then

@erikbern
Copy link

it probably is a bug – it sort of works because -1 ends up being interpreted as "search exhaustively", but i'm not sure if that's a bug or a feature honestly.

@LTLA
Copy link
Contributor

LTLA commented Feb 29, 2020

runtime error: -1 is outside the range of representable values of type 'unsigned long'

Woah. Seriously? I thought assigning -1 to an unsigned int was a standard way of getting the maximum representable value. Maybe UBSAN wants us to be more explicit about it with a static_cast?

@eddelbuettel
Copy link
Owner Author

The code is in the examples section of the manual page. So we could avoid. So somewhat naive question for @erikbern: are there better ways to do this?

# Retrieve 5 nearest neighbors to item 0
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByItemList(0, 5, -1, TRUE)

# Retrieve 5 nearest neighbors to item 0
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list without an element for distance
a$getNNsByItemList(0, 5, -1, FALSE)

# [....]

# Retrieve 5 nearest neighbors to vector v
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByVectorList(v, 5, -1, TRUE)

# Retrieve 5 nearest neighbors to vector v
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByVectorList(v, 5, -1, TRUE)

Should we / could use NA, then pick up the NA and replace it with, say, std::numeric_limits<uint64_t>::max() ?

@jlmelville
Copy link
Contributor

What about adding an overload to the likes of getNNsByItemList which doesn't have the search_k parameter and which means "always do exhaustive search"? The overload can then forward to the original with a static_cast<size_t>(-1), and the examples can be updated to use the new method, e.g.:

a$getNNsByItemList(0, 5, FALSE)
# instead of
a$getNNsByItemList(0, 5, -1, FALSE)

The downsides I see are extra documentation and having to write out the exact method pointer types to disambiguate the overload in the RCPP_MODULE, which leads to horrible stuff like:

Rcpp::List (AnnoyAngular::*getNNsByItemList_1)(int32_t, size_t, size_t, bool) = &AnnoyAngular::getNNsByItemList;
Rcpp::List (AnnoyAngular::*getNNsByItemList_2)(int32_t, size_t, bool) = &AnnoyAngular::getNNsByItemList;

This compiled and seemed to work when I tested it, so if there is interest in this approach I can make a PR. But I should note that I haven't actually tested that it fixes the UBSAN issues.

@LTLA
Copy link
Contributor

LTLA commented Feb 29, 2020

Before embarking on that, perhaps the most expedient solution would be to replace size_t with int for the R-facing interface of the affected method:

rcppannoy/src/annoy.cpp

Lines 77 to 78 in 68fe172

Rcpp::List getNNsByItemList(S item, size_t n,
size_t search_k, bool include_distances) {

and then handling the coercion inside getNNsByItemList. Either by a simple static_cast<size_t>(search_k) or by setting it to the appropriate numeric_limits if search_k is negative.

I mean, it doesn't seem that the arguments to getNNsByItemList need to be size_t.

@eddelbuettel
Copy link
Owner Author

Yep. And the could carry an NA as an int representation which we could the map to a special value (UINT_MAX?) and move on. The key really is not loose functionality, but to loose the SAN error.

@eddelbuettel
Copy link
Owner Author

Still in Copenhagen at the conference, but yes, the key probably is to replace size_t (aka uint64_r) with int(or maybedouble` if we are concerned about domain range).

@erikbern Code below. Basically size_t n etc in the signatures is what I would need to change (to something R support, ie int or double). Any reason not to just use int ?

    std::vector<S> getNNsByItem(S item, size_t n) {
        std::vector<S> result;
        ptr->get_nns_by_item(item, n, -1, &result, NULL);
        return result;
    }

    Rcpp::List getNNsByItemList(S item, size_t n,
                                size_t search_k, bool include_distances) {
        if (include_distances) {
            std::vector<S> result;
            std::vector<T> distances;
            ptr->get_nns_by_item(item, n, search_k, &result, &distances);
            return Rcpp::List::create(Rcpp::Named("item")     = result,
                                      Rcpp::Named("distance") = distances);
        } else {
            std::vector<S> result;
            ptr->get_nns_by_item(item, n, search_k, &result, NULL);
            return Rcpp::List::create(Rcpp::Named("item") = result);
        }
    }

    std::vector<S> getNNsByVector(std::vector<double> dv, size_t n) {
        std::vector<T> fv(dv.size());
        std::copy(dv.begin(), dv.end(), fv.begin());
        std::vector<S> result;
        ptr->get_nns_by_vector(&fv[0], n, -1, &result, NULL);
        return result;
    }

    Rcpp::List getNNsByVectorList(std::vector<T> fv, size_t n,
                                  size_t search_k, bool include_distances) {
        if (fv.size() != vectorsz) {
            Rcpp::stop("fv.size() != vector_size");
        }
        if (include_distances) {
            std::vector<S> result;
            std::vector<T> distances;
            ptr->get_nns_by_vector(&fv[0], n, search_k, &result, &distances);
            return Rcpp::List::create(
                Rcpp::Named("item") = result,
                Rcpp::Named("distance") = distances);
        } else {
            std::vector<S> result;
            ptr->get_nns_by_vector(&fv[0], n, search_k, &result, NULL);
            return Rcpp::List::create(Rcpp::Named("item") = result);
        }
    }

@erikbern
Copy link

i agree it should just be int if we're using -1 as a special value. i can fix

@eddelbuettel
Copy link
Owner Author

Yeah, I probably just followed your interface and was not thinking hard enough about the limitations I would get from a size_t in the interface.

Just tried the Rocker container for r-devel-ubsan-clang but I could not yet reproduce the issue CRAN saw so the ubsan config I encoded in the Rocker (== Docker for R) container must be slightly different. But if we just switch to int it should be better. But I can't really test it pre-upload.

@eddelbuettel
Copy link
Owner Author

(This is now taken care of in master thanks to very nice flurry of activity by @erikbern @jlmelville @LTLA and myself. We should have 0.0.16 on CRAN by end of week; I will close it then. If anybody needs a tarball of 0.0.15.2 in a drat-able repo let me know, else just fetch master.)

@eddelbuettel
Copy link
Owner Author

Version 0.0.16 is in incoming at CRAN and polishes all this. It is still on hold because the remaining UBSAN issue is addresses. Thanks again to everybody for pitching in, this was pretty fun and effective.

@eddelbuettel
Copy link
Owner Author

Took a few days for all builders to catch up but we do have the anticipated clean bill of health. (The remaining NOTE is something we cannot do anything, and that similarly nobody cares too much about.) Thanks again to everybody for pitching in.

image

@jlmelville
Copy link
Contributor

It's a thing of beauty. I may have stared at that clean check page with a sense of well-being for longer than I care to admit.

@eddelbuettel
Copy link
Owner Author

(There is also the foghorn package that does that for all your packages. I find it a little heavy, so leaning on a few some hints and code by @brodieG I cooked up a helper function in my rag-tag bag of small functions. Credits for this really mostly to @brodieG though.)

R> dang::checkCRANStatus("[email protected]")
               Package ERROR WARN NOTE OK
1              anytime     2           11
2          AsioHeaders                 13
3                   BH               8  5
4                 binb                 13
5                 dang                 13
6               digest                 13
7                 drat                 13
8           gaussfacts                 13
9                 gcbd          3    6   
10               gettz               2 11
11            gunsales          2      11
12              inline                 13
13                linl                 13
14             littler                  6
15            nanotime                 13
16                pinp                 13
17           pkgKitten                 13
18                prrd                 13
19              random                 13
20        RApiDatetime                 13
21       RApiSerialize                 13
22             Rblpapi              12   
23                Rcpp     1         8  4
24           RcppAnnoy               3 10
25             RcppAPT     3            7
26       RcppArmadillo               8  5
27             RcppBDT               4  9
28            RcppCCTZ                 13
29         RcppClassic          1    1 11
30 RcppClassicExamples                 13
31            RcppCNPy          1    2 10
32              RcppDE                 13
33           RcppEigen               8  5
34        RcppExamples                 13
35         RcppGetconf               2  7
36             RcppGSL                 13
37         RcppMsgPack               8  5
38    RcppNLoptExample                 13
39      RcppQuantuccia                 13
40           RcppRedis                 12
41        RcppSimdJson     1         3  5
42             RcppSMC              11  2
43         RcppStreams               2 11
44            RcppTOML               2 11
45             RcppXts              13   
46        RcppZiggurat               2 11
47          RDieHarder          6       6
48              rfoaas                 13
49             RInside          2   11   
50             rmsfact                 13
51           RProtoBuf          1    6  5
52         RPushbullet                 13
53           RQuantLib               6  4
54       RVowpalWabbit     3         5   
55          sanitizers     1         2 10
56                tint          1    2 10
57                ttdo                 13
58           x13binary              12  1
Errors/Warnings Present
https://cran.r-project.org/web/checks/check_results_edd_at_debian.org.html

R> 

And yes, I have too many packages, and yes, CRAN still throws too many stooopid errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants