-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removal of previously liked items #131
Comments
If I understand what you're asking, you can just call model.recommend(userid, Z) where Z is an empty sparse matrix with the same shape as user_items. |
Do you mean a sparse matrix full of zeros? Because I get a score of nan if I do that |
@Vslira taking an empty matrix as user-item matrix only works for ALS and BPR models. |
I agree with @srcolinas on this issue. I am trying to implement the same evaluation method from the paper (Hu, Koren, Volinsky). So getting all the items in sorted order including the liked ones would be very helpful. UpdateI just tried rank_items for a lot of users. It seems to take a really long time. It takes 8 seconds to evaluate 100 users (44699 items), and I have 138k users. So rank_items seems to be really slow. Is there a better way to implement the evaluation method from the paper ? |
I thought it can be easily implemented and I created a PR on that(#140). |
I don't think including liked items from the train set when evaluating is a good idea. The problem here is that if you leave these items in, almost all the returned results will be liked items from the train set - and these will push down the liked items from the test set. This leads to erroneous conclusions: I've seen cases where 90+% of the top 100 results returned by the als model are the liked items from the train set. This artificially lowered the score of the model and leads to false conclusions about which model is performing better. @DollarAkshay - evaluation of these models will probably take longer than fitting. In your case it has to score and sort every item for every user. I would focus on something on like P@K or MAP@K instead. The ranking of items far down the list doesn't actually matter that much to the user: If the item is at position 1K or 10K it is very doubtful that the user will see it. There is also some evidence that using metrics that measure early precision tend to lead to better user satisfaction (though I can't find the link at the moment). The nice thing about using P@K is that you can then use one the approximate MF models to speed up. |
@benfred I still think we should be able to choose whether to remove previously liked items or not. Specially, because the user liked an item does not mean he has bought it or that he would not buy it again, so an automated system may still recommend it. Moreover, the library may be better if the user is free to choose the way to evaluate that he considers more appropriate. The PR #140 is a step towards solving this. |
Thanks everyone, I've merged #140 - so with the version on master you should be able to do this now. |
Hi,
It would be nice to have a way to retrieve recommendations without ignoring previously liked items. I know in the end I would ignore those items, but it would be convenient to compare predictions with other libraries and evaluate performance on some metrics I already have working with those libraries.
My idea so far: use rank_items for the whole list of items and then retrieve the ones with the greater rank. It feels very inefficient though.
By the way...great work! Thank you.
The text was updated successfully, but these errors were encountered: