-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work around Annoy UBSAN issues #50
Comments
Crap. BiocNeighbors will also be affected if RcppAnnoy goes down.
Perhaps we can change that? I can't imagine @eddelbuettel would be too happy about this either. |
Dang. I do of course get the UBSAN warnings too for @erikbern Any chance we could have an opt-out / slower alternative? See the two links (in red) off this page: https://cloud.r-project.org/web/checks/check_results_RcppAnnoy.html @LTLA @jlmelville Very worst case: could it become a |
Unfortunately, that wouldn't be possible for me; I compile against the headers in RcppAnnoy, so it's a hard dependency from the perspective of BiocNeighbors. On the plus side, Bioconductor doesn't do UBSAN checks, so I could get away with just stuffing the headers into my |
Just thinking out loud: could we do a poor man's version that doesn't assign an array of size one and knowingly writes outside of boundaries, but a |
Previously, saying uwot inherits the UBSAN from RcppAnnoy was enough to get the submission approved, so I did ask for some clarification on what this all means for RcppAnnoy (in the no-doubt vain hope they will say "really? oh, ok, go on then"). In my experience "please fix and resubmit" is the end of the line for the current submission, so I don't know if I will get a reply. If this has resulted in the Eye of Sauron being turned upon RcppAnnoy prematurely, I apologise. @eddelbuettel: in terms of replacing the array with a |
"It's complicated." I knew where CRAN is coming from, yet it is tired. I have several packages that 'stretch the limit' somewhat. Rcpp, RInside, RcppAnnoy, ... all get warnings this way, but they always have and it isn't always possible to rewrite. Here, it just occured to me a possibly 'simple enough yet safer' approach. I'd be game for all of trying to build that (as a derived version of Annoy, maybe?) and write a quick paper about the tradeoff and speed impact, assuming we get it done. Obviously with full credit to all things Annoy that get re-used. @erikbern Interested in helping / getting involved? |
I'm not following. When I look at the compiler output, it looks like the error is that So maybe there's just some header file that needs to get included or something? |
@erikbern, the undefined behavior is the issue, which is accessed via the https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/RcppAnnoy/RcppAnnoy-Ex.Rout (the lack of a Solaris build is a separate issue). |
Sorry for being somewhat vague. UBSAN issue with clangLink https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/RcppAnnoy/RcppAnnoy-Ex.Rout
UBSAN issue with gccLink https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/RcppAnnoy/RcppAnnoy-Ex.Rout
I had forgotten that each directory contained several files, sorry about that. But it is otherwise a clear case: UBSAN just finds what it designed to find ... we are out of bounds here by design. Sigh. Can't win that unless we get a whitelist. Which we won't ... |
Got it, I see the issue. It's probably not super hard to fix though, although it's a bit janky. Two options off the top of my head
I don't quite know what's going on here and how to repro it but let me know what you think of the above |
Spot on! It shouldn't be hard to repro, possibly outside of R too, as we have some instrumentation. Eg https://builder.r-hub.io/ is build service that has an UBSAN variant, and there are some Docker containers. I only ever looked at this briefly (though repeatedly :-). The array is for the number of child nodes, correct? What is a 'safe enough' upper bound, how much would we loose in performance and/or memory use, and is picking |
I put together a quick PR that maybe fixes it in a dumb way? Idk take a look :) |
The good news: @erikbern's fix in spotify/annoy#455 does remove one set of UBSAN issues. The less good news: there's still an UBSAN issue with misalignment of 8 byte |
Just to confirm this explicitly @eddelbuettel: I commented out the Hamming tests in my local checkout of For |
Well this is weird. I have just tried to repeatedly re-trigger the memory alignment complaint and I can't. I even did a fresh checkout of RcppAnnoy, confirmed the @eddelbuettel, can you confirm any of this either way? |
The memory alignment probably only triggers stochastically when the allocated block of memory is not a multiple of 8. My guess is most of the time it will be |
@jlmelville I haven't been able to confirm besides submitting to rhub, which I only did with the very first (incomplete) PR change. Edit: Also, that rhub builder is wonky. Sometimes I see error in the headline summary but OKs in the text. Anyway, big thanks already to Erik. This is a huge step forward. |
But looks good. Created a branch, updated upstream |
Quick follow-up. I realized that I had two Rocker containers with SAN/UBSAN, so I triggered a rebuild. One (using Good news: The old behaviour was seen on the older RcppAnnoy and is gone in the new one. Three cheers to @erikbern and a big thank you to @jlmelville for pushing this. Not so good news:
these pop up a few times for the different metrics. Can we 'unpack' this? (Goes digging for a bit...) I guess we could add a
Using the empty definition is fine, as you'd expect. So we can wait for CRAN to complain and turn off as needed... |
Quick update: Wheels of progress are moving slowly at CRAN, took a day for them yesterday to get back to me ... only to say that I am still naughty for using |
New RcppAnnoy on CRAN. Let's see how it does. One new/remaining clang UBSAN issue per email from CRAN. I opened an issue over at RcppAnnoy. |
Thank you for all the time you have spent on this @eddelbuettel. I have got a rocker image of |
Yes, triggered a refesh of that image a few days ago. Does that trigger the issue? And it is on the R/Rcpp side only so one once @erikbern can just say "you guys..." ? Greetings from CPH. |
Looks much better already at https://cloud.r-project.org/web/checks/check_results_RcppAnnoy.html |
@eddelbuettel the rocker image does not trigger any complaints for me at the moment, but I suspect I may not have correctly invoked R or |
Remember you must use |
Thank you for the pointer @eddelbuettel, it's been a while since I have run the rocker container and I was on the way out to my day job this morning. I got a bit further tonight but I have admitted defeat for the evening. For the record, I ran:
Unfortunately I get the error:
I wasn't able to satisfactorily resolve this with my googling as a lot of the results want R to be rebuilt with other flags. Could I have used a suitably clang-erized version of the per-package Rather than waste people's time and flap about ineffectually I will wait to see how the CRAN checks shake out. |
Apart from my inability to work with ASAN, the good news is that I ran uwot through The bad news: I had forgotten about the UBSAN problems I get from using RcppParallel (the ones uwot get are the same as those that can be seen at https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/RcppParallel/RcppParallel-Ex.Rout), so uwot's goose is still thoroughly cooked. But that's a problem for another issue. I will close this issue when RcppAnnoy's CRAN checks all get up to date (hopefully with the UBSAN links disappearing for good). I am also deeply appreciative of the time @eddelbuettel and @erikbern have spent on this. |
CRAN checks show no problems with RcppAnnoy 0.0.16 (and coincidentally an Annoy problem on Windows with writing indices to disk that can't be read back in has been fixed). Once again, great work by @eddelbuettel and @erikbern. A new release of uwot will be submitted to CRAN next Monday. Closing. |
Yes, been checking that too. They were driving me batty over not admitting RcppAnnoy 0.0.16 for 48 or so hours as it still needn't a manual inspection over the remaining-from-0.0.15 UBSAN issue. Should be better now as 0.0.16 is as expected clean as a whistle. Your |
a) Please do not hijack old issues. |
The latest submission of
uwot
to CRAN has been rejected due to the UBSAN issues inherited from RcppAnnoy (the UBSAN check is currently accessible via a link on https://cran.r-project.org/web/checks/check_results_uwot.html):If this decision isn't reconsidered, I imagine that this is likely to see
uwot
being removed from CRAN shortly.The UBSAN issue is also present in
RcppAnnoy
itself, notuwot
's specific use of the package (as far as I can tell anyway): https://cran.r-project.org/web/checks/check_results_RcppAnnoy.html and is due to how the underlying Annoy library is written. It's not going to get fixed because it's Annoy working as designed. It's not clear to me at the moment if this meansRcppAnnoy
will also be removed from CRAN or what has changed in policy since the last submission ofuwot
(or indeed of RcppAnnoy).At any rate, I grow weary of the ban-hammer lottery
uwot
enters every time I want to update the package on CRAN. The obvious solution is to stop using Annoy. The upside would be:uwot
.The obvious downsides are:
RcppHNSW is a possible alternative, but it supports fewer metrics than Annoy and is a lot slower.
I do want to get on with rnndescent, the upsides of which are:
A big downside is:
dataset.
Other downsides that emerge from the fact that I am writing the package, so inevitably:
The text was updated successfully, but these errors were encountered: