Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nearest Neighbours Recommendations #14

Merged
merged 2 commits into from
Feb 12, 2017
Merged

Nearest Neighbours Recommendations #14

merged 2 commits into from
Feb 12, 2017

Conversation

benfred
Copy link
Owner

@benfred benfred commented Dec 27, 2016

This adds a fast and memory efficient of Item-Item KNN Recommendation models.

Calculating the Similarity matrix is based on the algorithm described in the
paper 'Sparse Matrix Multiplication Package (SMMP)'
(www.i2m.univ-amu.fr/~bradji/multp_sparse.pdf), but modified so that only the
top K rows are selected using a heap. This means that we can calculate
the similarity matrix even when the full similarity matrix wouldn't fit in
available memory. This calculation is also parallelized unlike the sparse matrix
multiply in scipy.

Also switch to using C++ instead of C for Cython, run flake8 on the Cython code,
add an isort check and cpplint check, and fix some issues with the ALS unittest
intermittently failing.

@benfred
Copy link
Owner Author

benfred commented Dec 27, 2016

still todo:

  • parallelize calculation
  • add scorer class
  • example usage
  • add save/load to scorer

This adds a fast and memory efficient of Item-Item KNN Recommendation models.

Calculating the Similarity matrix is based on the algorithm described in the
paper 'Sparse Matrix Multiplication Package (SMMP)'
(www.i2m.univ-amu.fr/~bradji/multp_sparse.pdf), but modified so that only the
top K rows are selected using a heap. This means that we can calculate
the similarity matrix even when the full similarity matrix wouldn't fit in
available memory. This calculation is also parallelized unlike the sparse matrix
multiply in scipy.

Also switch to using C++ instead of C for Cython, run flake8 on the Cython code,
add an isort check and cpplint check, and fix some issues with the ALS unittest
intermittently failing.
@benfred benfred force-pushed the nearest_neighbours branch from 2230bc6 to bc548a0 Compare February 6, 2017 04:47
@benfred benfred changed the title first draft nearest neighbours code Nearest Neighbours Recommendations Feb 6, 2017
@benfred benfred merged commit f5a3cdc into master Feb 12, 2017
@benfred benfred deleted the nearest_neighbours branch February 12, 2017 17:51
@chapleau
Copy link

Thanks for providing this very neat package.
I was just wondering if, from a performance point of view, going to C++ from C for Cython makes a significant improvement ? Are the APIs/functions backward compatible ?
Thanks!

@benfred
Copy link
Owner Author

benfred commented Feb 14, 2017

Performance should be identical between C++ and C.

The API's and functions are also compatible from Python - I changed to C++ mainly to use the heap functions provided with the STL: https://github.com/benfred/implicit/blob/master/implicit/nearest_neighbours.h#L21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants