[question] Getting recommendations for 14M+ users quickly #29

msh-tf · 2017-05-09T16:41:23Z

Hi Ben, thanks for the package. I have a dataset of 14M+ users and 1M+ items. The model fitting take around 70 mins. But getting recommendations for all users would take north of 10 hours. Is there a way to expedite this, parallelize it maybe. Any suggestions?

benfred · 2017-05-09T18:02:46Z

The problem here is that for each user, this has to rank all 1 million items.

For getting similar items, I had a similar problem that I solved using annoy. On the lastfm dataset it sped things up from about 1 hour to 30 seconds or so on my laptop: www.benfrederickson.com/matrix-factorization/ by doing an approximate neigbhour search instead of an exact search (which is basically the same problem you have here).

However, annoy is looking at cosine distance and the predictor is the dot product. The dot product is a little trickier to generate fast approximate nearest neighbours for (the dot product isn't a true distance since it doesn't satisfy the triangle inequality, making approximate measures trickier). There are approaches that could work here (like https://arxiv.org/pdf/1202.6101v1.pdf) but I haven't tried it out yet.

Parallelizing the computation should be pretty easy though, and would be a small win. If you want to try it out, might be worth looking at the nearest neighbour code in this package.This code returns the K nearest neighbours by dot product of a sparse matrix: https://github.com/benfred/implicit/blob/master/implicit/_nearest_neighbours.pyx#L48 . Doing something similar on the dense factors is even easier since you don't have to worry about doing a sparse matrix multiplication.

dselivanov · 2017-05-12T13:43:02Z

Interesting paper about ANN for dot-product.
My 2 cents - finding top k elements in array in O(n * log(k)) instead of O(n * log(n)) for case of naive ordering - stackoverflow thread.

chapleau · 2017-05-12T14:33:43Z

There are a few recent papers about the LSH MIPS problem. See e.g. https://arxiv.org/abs/1410.5518
It involves augmenting your vector with one extra dimension so that in the new space the dot product = inner product in the original one.

benfred · 2017-05-12T17:21:46Z

@dselivanov I'm using numpy's argpartition function to get the K largest items here, which uses the introselect algorithm so we should already be at O(n log(k)) running time. I wrote an interactive visualization of using heaps to select the K largest items a while ago thats also worth checking out =)

@chapleau thanks for the link! I'll check it out

benfred · 2017-05-15T05:02:56Z

I tried out the idea from the paper: transforming a cosine nearest neighbours search into an inner product nearest neighbours search by adding an extra dimension.

The code to do this and get annoy to do a nearest neighbour search for the recommended items based off the transformed vectors is here: https://github.com/benfred/implicit/blob/master/implicit/annoy_als.py

I tested it out quickly, it seems to run around 20 times faster on the last.fm dataset on my laptop - with the downside that the precision is only around 80%. Adding more trees to annoy improves precision, but also slows down the retrieval.

Anyways - I thought I'd put this code out there, it might be worth experimenting with.

from implicit.annoy_als import AnnoyAlternatingLeastSquares

model = AnnoyAlternatingLeastSquares()
model.fit(data)

benfred · 2017-10-11T20:08:28Z

I tested out Annoy/NMSLib/Faiss for quickly generating recommendations, http://benfrederickson.com/approximate-nearest-neighbours-for-recommender-systems/ and added support in the last commit.

It should be possible to do this relatively efficiently, let me know how it goes.

H0bbitBaron · 2018-12-24T15:58:37Z

Hi, @benfred, thank you for excellent tool! I wonder is there easier way to get recommendations for a few millions of users than slowly looping (#72) als.AlternatingLeastSquares.recommend() over them or modifying https://github.com/benfred/implicit/blob/master/implicit/evaluation.pyx#L96 to return array to fasten computations? Thank you in advance for help or hints.

benfred closed this as completed Oct 11, 2017

MariosGr mentioned this issue Mar 17, 2018

model fit crushes with more than 2^31 positive interactions #86

Closed

Tych0n mentioned this issue Jan 14, 2019

Function to make top-k recommendations to all users in a fast way for ALS model. #179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Getting recommendations for 14M+ users quickly #29

[question] Getting recommendations for 14M+ users quickly #29

msh-tf commented May 9, 2017

benfred commented May 9, 2017

dselivanov commented May 12, 2017

chapleau commented May 12, 2017

benfred commented May 12, 2017

benfred commented May 15, 2017

benfred commented Oct 11, 2017

H0bbitBaron commented Dec 24, 2018 •

edited

Loading

[question] Getting recommendations for 14M+ users quickly #29

[question] Getting recommendations for 14M+ users quickly #29

Comments

msh-tf commented May 9, 2017

benfred commented May 9, 2017

dselivanov commented May 12, 2017

chapleau commented May 12, 2017

benfred commented May 12, 2017

benfred commented May 15, 2017

benfred commented Oct 11, 2017

H0bbitBaron commented Dec 24, 2018 • edited Loading

H0bbitBaron commented Dec 24, 2018 •

edited

Loading