Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Getting recommendations for 14M+ users quickly #29

Closed
msh-tf opened this issue May 9, 2017 · 7 comments · Fixed by #179
Closed

[question] Getting recommendations for 14M+ users quickly #29

msh-tf opened this issue May 9, 2017 · 7 comments · Fixed by #179

Comments

@msh-tf
Copy link

msh-tf commented May 9, 2017

Hi Ben, thanks for the package. I have a dataset of 14M+ users and 1M+ items. The model fitting take around 70 mins. But getting recommendations for all users would take north of 10 hours. Is there a way to expedite this, parallelize it maybe. Any suggestions?

@benfred
Copy link
Owner

benfred commented May 9, 2017

The problem here is that for each user, this has to rank all 1 million items.

For getting similar items, I had a similar problem that I solved using annoy. On the lastfm dataset it sped things up from about 1 hour to 30 seconds or so on my laptop: www.benfrederickson.com/matrix-factorization/ by doing an approximate neigbhour search instead of an exact search (which is basically the same problem you have here).

However, annoy is looking at cosine distance and the predictor is the dot product. The dot product is a little trickier to generate fast approximate nearest neighbours for (the dot product isn't a true distance since it doesn't satisfy the triangle inequality, making approximate measures trickier). There are approaches that could work here (like https://arxiv.org/pdf/1202.6101v1.pdf) but I haven't tried it out yet.

Parallelizing the computation should be pretty easy though, and would be a small win. If you want to try it out, might be worth looking at the nearest neighbour code in this package.This code returns the K nearest neighbours by dot product of a sparse matrix: https://github.com/benfred/implicit/blob/master/implicit/_nearest_neighbours.pyx#L48 . Doing something similar on the dense factors is even easier since you don't have to worry about doing a sparse matrix multiplication.

@dselivanov
Copy link

Interesting paper about ANN for dot-product.
My 2 cents - finding top k elements in array in O(n * log(k)) instead of O(n * log(n)) for case of naive ordering - stackoverflow thread.

@chapleau
Copy link

There are a few recent papers about the LSH MIPS problem. See e.g. https://arxiv.org/abs/1410.5518
It involves augmenting your vector with one extra dimension so that in the new space the dot product = inner product in the original one.

@benfred
Copy link
Owner

benfred commented May 12, 2017

@dselivanov I'm using numpy's argpartition function to get the K largest items here, which uses the introselect algorithm so we should already be at O(n log(k)) running time. I wrote an interactive visualization of using heaps to select the K largest items a while ago thats also worth checking out =)

@chapleau thanks for the link! I'll check it out

@benfred
Copy link
Owner

benfred commented May 15, 2017

I tried out the idea from the paper: transforming a cosine nearest neighbours search into an inner product nearest neighbours search by adding an extra dimension.

The code to do this and get annoy to do a nearest neighbour search for the recommended items based off the transformed vectors is here: https://github.com/benfred/implicit/blob/master/implicit/annoy_als.py

I tested it out quickly, it seems to run around 20 times faster on the last.fm dataset on my laptop - with the downside that the precision is only around 80%. Adding more trees to annoy improves precision, but also slows down the retrieval.

Anyways - I thought I'd put this code out there, it might be worth experimenting with.

from implicit.annoy_als import AnnoyAlternatingLeastSquares

model = AnnoyAlternatingLeastSquares()
model.fit(data)

@benfred
Copy link
Owner

benfred commented Oct 11, 2017

I tested out Annoy/NMSLib/Faiss for quickly generating recommendations, http://benfrederickson.com/approximate-nearest-neighbours-for-recommender-systems/ and added support in the last commit.

It should be possible to do this relatively efficiently, let me know how it goes.

@H0bbitBaron
Copy link

H0bbitBaron commented Dec 24, 2018

Hi, @benfred, thank you for excellent tool! I wonder is there easier way to get recommendations for a few millions of users than slowly looping (#72) als.AlternatingLeastSquares.recommend() over them or modifying https://github.com/benfred/implicit/blob/master/implicit/evaluation.pyx#L96 to return array to fasten computations? Thank you in advance for help or hints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants