Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of previously liked items #131

Closed
srcolinas opened this issue Jun 28, 2018 · 8 comments
Closed

Removal of previously liked items #131

srcolinas opened this issue Jun 28, 2018 · 8 comments

Comments

@srcolinas
Copy link

srcolinas commented Jun 28, 2018

Hi,
It would be nice to have a way to retrieve recommendations without ignoring previously liked items. I know in the end I would ignore those items, but it would be convenient to compare predictions with other libraries and evaluate performance on some metrics I already have working with those libraries.

My idea so far: use rank_items for the whole list of items and then retrieve the ones with the greater rank. It feels very inefficient though.

By the way...great work! Thank you.

@Vslira
Copy link

Vslira commented Jun 29, 2018

If I understand what you're asking, you can just call model.recommend(userid, Z) where Z is an empty sparse matrix with the same shape as user_items.

@srcolinas
Copy link
Author

srcolinas commented Jun 29, 2018

Do you mean a sparse matrix full of zeros? Because I get a score of nan if I do that

@ita9naiwa
Copy link
Collaborator

@Vslira taking an empty matrix as user-item matrix only works for ALS and BPR models.

@DollarAkshay
Copy link

DollarAkshay commented Jul 13, 2018

I agree with @srcolinas on this issue. I am trying to implement the same evaluation method from the paper (Hu, Koren, Volinsky). So getting all the items in sorted order including the liked ones would be very helpful.

Update

I just tried rank_items for a lot of users. It seems to take a really long time. It takes 8 seconds to evaluate 100 users (44699 items), and I have 138k users. So rank_items seems to be really slow. Is there a better way to implement the evaluation method from the paper ?

@ita9naiwa
Copy link
Collaborator

I thought it can be easily implemented and I created a PR on that(#140).
please check this out.

@benfred
Copy link
Owner

benfred commented Jul 16, 2018

I don't think including liked items from the train set when evaluating is a good idea.

The problem here is that if you leave these items in, almost all the returned results will be liked items from the train set - and these will push down the liked items from the test set. This leads to erroneous conclusions: I've seen cases where 90+% of the top 100 results returned by the als model are the liked items from the train set. This artificially lowered the score of the model and leads to false conclusions about which model is performing better.

@DollarAkshay - evaluation of these models will probably take longer than fitting. In your case it has to score and sort every item for every user.

I would focus on something on like P@K or MAP@K instead. The ranking of items far down the list doesn't actually matter that much to the user: If the item is at position 1K or 10K it is very doubtful that the user will see it. There is also some evidence that using metrics that measure early precision tend to lead to better user satisfaction (though I can't find the link at the moment). The nice thing about using P@K is that you can then use one the approximate MF models to speed up.

@srcolinas
Copy link
Author

srcolinas commented Jul 16, 2018

@benfred I still think we should be able to choose whether to remove previously liked items or not. Specially, because the user liked an item does not mean he has bought it or that he would not buy it again, so an automated system may still recommend it. Moreover, the library may be better if the user is free to choose the way to evaluate that he considers more appropriate. The PR #140 is a step towards solving this.

@benfred
Copy link
Owner

benfred commented Jul 25, 2018

Thanks everyone, I've merged #140 - so with the version on master you should be able to do this now.

@benfred benfred closed this as completed Jul 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants