Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query top N recommended items #24

Closed
NumbaCruncha opened this issue Apr 5, 2017 · 2 comments
Closed

Query top N recommended items #24

NumbaCruncha opened this issue Apr 5, 2017 · 2 comments

Comments

@NumbaCruncha
Copy link

Hi Ben,

I'm using implicit to predict a top7list of recommendations using a sparse matrix of aggregated customer purchases composed of 7101 customer purchases from 24 products.

The issue I'm having is that I'm a little confused at the output from .recommend which produces a list of N tuples:

[(845, 1.0136324354312989), (1150, 1.0028331824506354), (51, 1.0027650376439357), (2411, 1.0024685562873292), (1810, 1.0019960930254448), (1211, 1.0018685279069661), (775, 1.0018545578136604)]

Now I would have expected the first value in the tuple to be an index to the product list, but I suspect that I'm looking at the indices for the latent factor vectors? If you give me a steer about the process for extracting out the product identities it would be very much appreciated.

Kind regards,
Michael.

`

import pandas as pd
import scipy.sparse as sparse
import numpy as np
import implicit
# import data and add header rows
data = pd.read_csv('D:\santander\\train_sample_small.csv', names=['cust_id', 'product', 'rating'])
# transform dataset to sum by activity
grouped_data = data.groupby(['cust_id', 'product']).sum().reset_index()
grouped_data.head()

image

# Only get customers where purchase totals were positive
grouped_purchased = grouped_data.query('rating > 0')
print(grouped_purchased.head())

# Get our unique customers
customers = list(np.sort(grouped_purchased.cust_id.unique()))

# Get our unique products that were purchased
products = list(grouped_purchased['product'].unique())

# All of our purchases
rating = list(grouped_purchased.rating)

# Get the associated row/column indices
rows = grouped_purchased['cust_id'].astype('category', categories=customers).cat.codes
cols = grouped_purchased['product'].astype('category', categories=products).cat.codes

# create sparse matrix from data
purchases_sparse = sparse.csr_matrix((rating, (rows, cols)), shape=(len(customers),    len(products)), dtype=np.float64)

# Build, fit model and recommend top 7 products for first user
model = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=50)
model.fit(item_users=purchases_sparse)
recom = model.recommend(userid=0, user_items=purchases_sparse.T, N=7)`
@benfred
Copy link
Owner

benfred commented May 3, 2017

Its the index into the matrix you passed into the 'fit' function - you'll need to map from the category id in your 'rows' back to the category. The example file shows how to do this here https://github.com/benfred/implicit/blob/master/examples/lastfm.py#L122-L128

Also the userid in the 'recommend' method is the column id in the item_users matrix.

@igorkf
Copy link

igorkf commented Nov 25, 2020

You can create a mapping like this:

user2idx = dict(zip(pivot_table['user_id'].cat.categories[pivot_table['user_id'].cat.codes].tolist(),
                    pivot_table['user_id'].cat.codes.tolist()))
idx2user = {x[1]: x[0] for x in user2idx.items()}

The first will create a dictionary where each key is a user_id, and each corresponding value is the user_index of the sparse matrix:

{user_id_0: user_index_0, user_id_1: user_index_1, ...}

The second is just the reverse mapping:

{user_index_0: user_id_0, user_index_1: user_id_1, ...}

So in the recommend() method you need to pass the user_index, not the user_id.
After this, the method will return the item_indexes, so you need a mapping idx2item too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants