-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use Rcpphnsw instead of Rcppannoy to accelerate nearest neighbor search #87
Comments
Hi @jianshu93, HNSW is faster than Annoy for queries, but the benchmarks don't usually include the cost of building the index, which is usually slower than for Annoy. This makes sense for a lot of use cases where the index is built once and then queried many times, but for UMAP, there is one build and one query. The last time I checked, the overall cost of building an HNSW index + querying was slower than for Annoy (for a given accuracy). You may also be interested in the discussion at: lmcinnes/umap#213 |
Hi Thanks for the response, I now understand it very well. I notice another hnsw implementation called granny (https://github.com/granne/granne), the index and query step are nearly done simultaneously and it is very fast (run the example to have a try), word indexing and embedding takes seconds (400,000 database vector, 300 dimension). Would that be relevant and worth some efforts to make a R wrapper? It was built using Rust, which is a memory efficient and high performance language. It was designed for a search engine. Thanks, Jianshu |
Thank you for the pointer to granne, which I hadn't seen. Unfortunately, I don't program in Rust currently, and don't know what the state of interfacing such code with R is, but I will keep it in mind. |
Hello uwot developer, There is one more discussion I want to start: for annoy, random projection forest was constructed first but you still need to run the query step for nearest neighbors for each element in the database right. For hnsw, as long as the small world graph is constructed, nearest neighbors for each datapoint in database is already there and just extract them (e.g., a small world graph to kgraph). So there is not need to query for hnsw. The time complexity for database building for hnsw is Nlog(N) while search is log(N), where N is the number of elements in database. For annoy, I think time complexity to build is Log(N) right (binary split of database elements) while you still need to search to find nearest neighbors for each database element. I am wondering whether Nlog(N) is better or bad than log(N) + time complexity of query for annoy, which is at least an order of magnitude slower than that of log(N) for hnsw. Thanks, Jianshu |
Does hnswlib provide an interface to convert the build small world network to a knn graph? |
As far as I know,No. We may need to explore a little bit or implement ourselves. I feel like it is at least as fast as annoy because searching nearest neighbors for all elements is expensive. I think it is easy because we only need the bottom layer and extract top k best neighbors for each element. Thanks |
graph structure is essentially a list recording links to other elements with distance. |
Whoever attempts this (and it's not likely to be me), this seems like something that should be pursued at the C++ level in hnswlib itself. Probably not something I would want to maintain in RcppHNSW and would not be appropriate for inclusion in uwot directly. But good luck if you intend to pursue this as I would certainly be happy to make use of it. |
Hello James, I noticed that for the search step, annoy always uses 4 threads though I assigned 24 to it on a server, there are no acceleration compare to running it on a laptop. Is the search well parallelized? Thanks, Jianshu |
@jianshu93 can you open a new issue for this? In that issue, please provide both the command you ran and the output (by adding |
Hello James, I will start a new issue. Another question to you: is annoy robust to high dimension dataset (or more precisely, high local intrinsic dimension dataset with LID>20, see the NN-descent paper, Dong et.al., 2010). From my perspective, random projection will also suffer from curse of dimensionality, especially for L2 distance. I am wondering the benefits of using annoy instead of N-descent (O(N^1.14), which is quite fast, even faster than hnsw, which is O(Nlog(N)), note that only larger than O (N^(1+1/e)) is slower than O(Nlog(N))). What is the time complexity for annoy both build+search, is there a precise analysis about the complexity? A very interesting paper: The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search Check figure 2 in the paper, as LID increases (left to right), annoy performance decrease much faster than hnsw, this is very interesting. For glove (angular distance), the most right one, both annoy and hnsw have bad recall. Thanks, Jianshu |
Nearest Neighbor Descent seems like a good option in principle. Note that you can use Annoy and nearest neighbor descent, by using NND to "optimize" the result returned by Annoy. This is what largevis does. The problem is that the last time I looked, there is no package on CRAN for NND. If there was, to be a practical addition to uwot, it would be need to be sufficiently flexible to handle the metrics used by uwot and to deal with transforming new data. For the latter, you would need to either: store the original training data as part of the uwot "model" (which would obviously substantially increase the model size) or require the user to pass it separately in |
Hello James, I want to revisit this topic. As I said before, after building the hnsw, k best neighbors can be extracted from hnsw graph bottom layer (layer 0) in consistent time. We have implemented it recently in Rust. I analyze the NN descent time complexity N^1.14, which will be much slower than hnsw, which N*(log(N)) for N >> 10^5. For annoy, build the database is log(N), however to improve accuracy, we need to build the tree multiple times, like 50 trees. We then need to search each element in the database for those 50 trees, search a binary tree is Log(N), for N element will be N*(log(N)) and then also multiple by 50, so it will be 50Nlog(N), which is 50 times slower than HNSW. NN descent is very bad and slow for hubness dataset, which has been shown in a number of papers. I do not understand why in your benchmark hnsw is much slower than annoy, because build is Nlog(N), search is log(N) * N elements, it is only 2N*log(N), even if you search each element after building, which is not necessary at all. Note that hnswlib header files is not parallelized, but the python binding is paralleled, do you compare single thread hnsw with multiple thread annoy? Thanks, Jianshu |
Hi Jianshu,
Nice, I remember you mentioning this as an idea. I am glad you got it to work. I would be happy to make use of it in RcppHNSW if the original C++ library ever gets this functionality.
I am no computer scientist, but I don't think big-O analysis works like that.
My experience with pynndescent is that it is competitive with other methods even for high-dimensional datasets. The difference may be that NN Descent traditionally used a random initialization, and pynndescent uses a tree-based initialization. Have you compared pynndescent with HNSW?
I don't think so. But maybe I misremember. If you run your own benchmarks for using multi-threaded Annoy and HSNW (through R) across a sufficient number of datasets, I would be very happy to see them. I could certainly be persuaded that HNSW is superior for nearest neighbor search and concede this point. But there would still be the effort to integrate it into uwot and then maintain compatibility with both Annoy and HNSW which is a lot of work. I have slogged away on an nearest neighbor descent library to give better nearest neighbor performance for many years without being satisfied with it, and that's without the prospect of safely integrating it into uwot, which is not a task I particularly look forward to, so that's really the limiting factor to adding more nearest neighbor functionality. If the Annoy-based nearest neighbor search is holding you back, you can generate the nearest neighbors in HNSW and (once converted to a suitable form) pass that to the library(uwot)
library(RcppHNSW)
iris_nbrs <- hnsw_knn(as.matrix(iris[, -5]), k = 15)
iris_umap_hnsw <- umap(iris, nn_method = iris_nbrs) Absolutely no Annoy involved in that example. |
Hello James, I am not sure about the annoy big O. They refuse to provide one. I just analyze based on my understandings of binary search tree, which is very similar to the space partitioning in annoy. Parallelism was only supported after Aug 2020, I saw that you committed to the Rcpp: parallel_for() in Rcpphnsw page. I think your benchmark is based on the paralleled version right? Thanks for the R guidance, I will definitely test it using Mnist-fashion dataset. Jianshu |
Right but to clarify, I don't think this is a fruitful line of inquiry. You can't use big O notation to say: two algorithms have NlogN scaling, but you need to run algorithm 1 50 times so it's 50 times slower than algorithm 2. That's not how big O analysis works. It's saying what the asymptotic behavior of the algorithm is. If something is O(NlogN) that doesn't mean it scales exactly with NlogN. It means it scales by c * NlogN where c is some constant (well, the actual definition is a bit different but that's close enough). The point is that when N becomes very large, then c becomes unimportant. But both Annoy and HNSW have different values for c. If HNSW's c is more than 50 times bigger than the one Annoy has then it doesn't matter if you have to run Annoy 50 times, Annoy comes out ahead. The exact value of c is going to be different depending on any specific value of N and the hardware etc. That's why you do the big-O analysis, to avoid all of this and only talk about what happens as N tends to infinity. You would be better off benchmarking Annoy and HSNW on a large number of datasets with different values of k and parameters and finding the runtime that it takes to reach equivalent accuracies. Not an easy task. Also, because compiler settings are restricted with R packages, you also have to take that into account for the purposes of uwot: I can't use architecture-specific optimizations, which could make a big difference to any Annoy vs HNSW comparison. But I understand that isn't necessarily of value to anyone who doesn't want to use an R package.
Sure, but bear in mind that I am the maintainer of that package so I wrote that! I am very aware of when that package added multi-threading. It would have been a single-threaded comparison that I looked at. It is an interesting question about how well adding threads works with either method but not one I have looked at. |
A quick answer, with k=50, hnsw is about 2 times faster than annoy for now, for the fashion dataset, embedding results are nearly the same (small differences). I will attached the figure soon. Jianshu |
I am trying using the largest dataset I have even seen, the Higgs boson dataset generated by large hadron collider: https://archive.ics.uci.edu/ml/datasets/HIGGS You can download the data folder and the HIGGS.csv in it is the 11 million datapoint with 28 features (1st column is label). If my big O analysis is somewhat close the the truth, you will see huge difference in terms of the NN step between annoy and hnsw. My testing using a 128 thread cluster showed that it takes only 10 min for the NN step using hnsw. I believe NN descent cannot finish within several hours while annoy could be also very slow because the larger the number of database elements (time complexity). Thanks, Jianshu |
As usual, it's taken years, but RcppHNSW is now supported as an optional dependency (i.e. you must install it yourself). For now this functionality lives on the But if anyone is feeling brave, I would love to get as many issues ironed out before submission. install.packages("RcppHNSW")
# I admit I haven't tested this bit
devtools::install_github("jlmelville/uwot", ref = "hnsw") You can then use HNSW for nearest neighbor search and it can be controlled via the iris_umap_hnsw <- umap(iris, nn_method = "hnsw", nn_args = list(M = 4, ef = 128, ef_construction = 20)) The meaning of these parameters are given in the hnswlib documentation. |
This has been merged and tested so should appear in the next CRAN release. |
Dear uwot developer,
I think hnsw is about 10-100 time faster than annoy to find nearest neighbors especially for large scale dataset. The Rcpphnsw is a very good copy of original hnsw without SIMD acceleration. 3 distances are supported.
Thanks,
Jianshu
The text was updated successfully, but these errors were encountered: