-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New / remaining sanitizer issues #56
Comments
odd. do you know what line in annoy this is caused on |
FWIW, this is the same UBSAN pattern that was triggered by the last version's clang-UBSAN checks a$getNNsByItemList(0, 5, -1, TRUE) which in turn calls: Rcpp::List getNNsByItemList(S item, size_t n,
size_t search_k, bool include_distances) {
// ... //
std::vector<S> result;
std::vector<T> distances;
ptr->get_nns_by_item(item, n, search_k, &result, &distances);
} but from the look of the message it may not be complaining about anything in Annoy here: I think it might be that |
ok doesn't look like anything in core annoy then |
it probably is a bug – it sort of works because -1 ends up being interpreted as "search exhaustively", but i'm not sure if that's a bug or a feature honestly. |
Woah. Seriously? I thought assigning -1 to an unsigned int was a standard way of getting the maximum representable value. Maybe UBSAN wants us to be more explicit about it with a |
The code is in the examples section of the manual page. So we could avoid. So somewhat naive question for @erikbern: are there better ways to do this? # Retrieve 5 nearest neighbors to item 0
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByItemList(0, 5, -1, TRUE)
# Retrieve 5 nearest neighbors to item 0
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list without an element for distance
a$getNNsByItemList(0, 5, -1, FALSE)
# [....]
# Retrieve 5 nearest neighbors to vector v
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByVectorList(v, 5, -1, TRUE)
# Retrieve 5 nearest neighbors to vector v
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByVectorList(v, 5, -1, TRUE) Should we / could use |
What about adding an overload to the likes of a$getNNsByItemList(0, 5, FALSE)
# instead of
a$getNNsByItemList(0, 5, -1, FALSE) The downsides I see are extra documentation and having to write out the exact method pointer types to disambiguate the overload in the Rcpp::List (AnnoyAngular::*getNNsByItemList_1)(int32_t, size_t, size_t, bool) = &AnnoyAngular::getNNsByItemList;
Rcpp::List (AnnoyAngular::*getNNsByItemList_2)(int32_t, size_t, bool) = &AnnoyAngular::getNNsByItemList; This compiled and seemed to work when I tested it, so if there is interest in this approach I can make a PR. But I should note that I haven't actually tested that it fixes the UBSAN issues. |
Before embarking on that, perhaps the most expedient solution would be to replace Lines 77 to 78 in 68fe172
and then handling the coercion inside I mean, it doesn't seem that the arguments to |
Yep. And the could carry an NA as an int representation which we could the map to a special value (UINT_MAX?) and move on. The key really is not loose functionality, but to loose the SAN error. |
Still in Copenhagen at the conference, but yes, the key probably is to replace @erikbern Code below. Basically std::vector<S> getNNsByItem(S item, size_t n) {
std::vector<S> result;
ptr->get_nns_by_item(item, n, -1, &result, NULL);
return result;
}
Rcpp::List getNNsByItemList(S item, size_t n,
size_t search_k, bool include_distances) {
if (include_distances) {
std::vector<S> result;
std::vector<T> distances;
ptr->get_nns_by_item(item, n, search_k, &result, &distances);
return Rcpp::List::create(Rcpp::Named("item") = result,
Rcpp::Named("distance") = distances);
} else {
std::vector<S> result;
ptr->get_nns_by_item(item, n, search_k, &result, NULL);
return Rcpp::List::create(Rcpp::Named("item") = result);
}
}
std::vector<S> getNNsByVector(std::vector<double> dv, size_t n) {
std::vector<T> fv(dv.size());
std::copy(dv.begin(), dv.end(), fv.begin());
std::vector<S> result;
ptr->get_nns_by_vector(&fv[0], n, -1, &result, NULL);
return result;
}
Rcpp::List getNNsByVectorList(std::vector<T> fv, size_t n,
size_t search_k, bool include_distances) {
if (fv.size() != vectorsz) {
Rcpp::stop("fv.size() != vector_size");
}
if (include_distances) {
std::vector<S> result;
std::vector<T> distances;
ptr->get_nns_by_vector(&fv[0], n, search_k, &result, &distances);
return Rcpp::List::create(
Rcpp::Named("item") = result,
Rcpp::Named("distance") = distances);
} else {
std::vector<S> result;
ptr->get_nns_by_vector(&fv[0], n, search_k, &result, NULL);
return Rcpp::List::create(Rcpp::Named("item") = result);
}
} |
i agree it should just be int if we're using -1 as a special value. i can fix |
Yeah, I probably just followed your interface and was not thinking hard enough about the limitations I would get from a Just tried the Rocker container for |
(This is now taken care of in master thanks to very nice flurry of activity by @erikbern @jlmelville @LTLA and myself. We should have 0.0.16 on CRAN by end of week; I will close it then. If anybody needs a tarball of 0.0.15.2 in a drat-able repo let me know, else just fetch master.) |
Version 0.0.16 is in incoming at CRAN and polishes all this. It is still on hold because the remaining UBSAN issue is addresses. Thanks again to everybody for pitching in, this was pretty fun and effective. |
It's a thing of beauty. I may have stared at that clean check page with a sense of well-being for longer than I care to admit. |
(There is also the R> dang::checkCRANStatus("[email protected]")
Package ERROR WARN NOTE OK
1 anytime 2 11
2 AsioHeaders 13
3 BH 8 5
4 binb 13
5 dang 13
6 digest 13
7 drat 13
8 gaussfacts 13
9 gcbd 3 6
10 gettz 2 11
11 gunsales 2 11
12 inline 13
13 linl 13
14 littler 6
15 nanotime 13
16 pinp 13
17 pkgKitten 13
18 prrd 13
19 random 13
20 RApiDatetime 13
21 RApiSerialize 13
22 Rblpapi 12
23 Rcpp 1 8 4
24 RcppAnnoy 3 10
25 RcppAPT 3 7
26 RcppArmadillo 8 5
27 RcppBDT 4 9
28 RcppCCTZ 13
29 RcppClassic 1 1 11
30 RcppClassicExamples 13
31 RcppCNPy 1 2 10
32 RcppDE 13
33 RcppEigen 8 5
34 RcppExamples 13
35 RcppGetconf 2 7
36 RcppGSL 13
37 RcppMsgPack 8 5
38 RcppNLoptExample 13
39 RcppQuantuccia 13
40 RcppRedis 12
41 RcppSimdJson 1 3 5
42 RcppSMC 11 2
43 RcppStreams 2 11
44 RcppTOML 2 11
45 RcppXts 13
46 RcppZiggurat 2 11
47 RDieHarder 6 6
48 rfoaas 13
49 RInside 2 11
50 rmsfact 13
51 RProtoBuf 1 6 5
52 RPushbullet 13
53 RQuantLib 6 4
54 RVowpalWabbit 3 5
55 sanitizers 1 2 10
56 tint 1 2 10
57 ttdo 13
58 x13binary 12 1
Errors/Warnings Present
https://cran.r-project.org/web/checks/check_results_edd_at_debian.org.html
R> And yes, I have too many packages, and yes, CRAN still throws too many stooopid errors. |
Per email from Uwe:
The text was updated successfully, but these errors were encountered: