Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] loss calculation and cross-validation #25

Closed
dselivanov opened this issue May 8, 2017 · 6 comments
Closed

[question] loss calculation and cross-validation #25

dselivanov opened this issue May 8, 2017 · 6 comments

Comments

@dselivanov
Copy link

Hi Ben. Thanks for library and especially for great posts. I'm wondering what is the procedure for loss calculation? I checked code here, but didn't understand exact algorithm. I suppose it is kind of approximation for loss from paper, isn't it? I think it is almost infeasible(very computationally expensive) to calculate exact loss from paper, because it will require to calculate prediction for each users and each item matrix in order to take into account loss for not observed items. Or I missed some trick?

@benfred
Copy link
Owner

benfred commented May 8, 2017

The loss function is calculated exactly - and only does work on observed items =).

So the loss is: Cu(Y^TXu - Pu) ^ 2 summed over all users , expanding it out is: Y^TXu^TCuXu - 2 Y^TXuCuPu + Pu^TCuPu

Setting A = YtCuY and b = Y^TCuPu, the first two terms can be rewritten as (A Xu - 2b) Xu - which is what I'm calculating in that function. For calculating b, I'm assuming that Pu is 1 for all non-zero items in the input matrix, and likewise for Pu^TCuPu I'm just summing the confidence.

To calculate A, I'm using that YtCuY is equal to YtY + Yt(Cu - I) Y . This is useful because Cu - I is sparse, and we can precompute YtY for all users. This means we can calculate the loss function on all item/user pairs but only do work on the observed nonzero items.

The final trick here is that I don't actually calculate A - I'm calculating A Xu directly, which is slightly faster.

@dselivanov
Copy link
Author

Thanks! One more question - can you describe logic for denominator at last row? Why not just number of non zeros?

@benfred
Copy link
Owner

benfred commented May 8, 2017

For the last row, I wanted to normalize by the total confidence , including the confidence of the missing entries.

The negative missing entries have a confidence of 1 in this model, so are in total Cui.shape[0] * Cui.shape[1] - Cui.nnz.

@dselivanov
Copy link
Author

dselivanov commented May 9, 2017

Yes, I've understood that. I'm developing R package and use similar approach to calculate loss, but divide only by number of non zeros (since we calculate loss only for observed interactions). Speed and results match yours more or less.

The problem is how to perform cross-validation between models, because obviously losses for models with different alphas will be very different. (I believe that your rational behind dividing by total confidence is to compare loss between models with different alpha, but I'm not sure this will work). What I ended with - is just use loss as convergence indicator, but calculate ndcg@k, map@k for cross-validation. So models with different hyper parameters could be compared. The interesting thing is that if we will use values in confidence matrix as relevance for ndcg@k we will end up in the same trap - ndcg will also depend on alpha. So I use not-transformed user-item interaction matrix as hold-out set for calculation of ndcg (so relevance is the same across models with different alphas).

What is your flow for cross-validation?

@dselivanov dselivanov changed the title [question] loss calculation [question] loss calculation and cross-validation May 9, 2017
@benfred
Copy link
Owner

benfred commented May 9, 2017

So - I'm calculating the loss for both observed (positive) and unobserved (negative) interactions in that function. Calculating the loss for only the positive items doesn't make much sense: the model could predict a constant 1 for every user/item pair and the loss would be 0 if you only calculate on the positive items. So you need to calculate the loss on both observed/unobserved items.

Since the loss function can be computed quickly (using same trick as in computing gradient), I used it to prove convergence with the CG code. Its not actually all that useful as a metric for doing cross validation though: what we really care about then is a metric that works on the ranking of the items returned. I like using something like MAP or MRR (mean recipricol rank) to measure that the withheld positive items are ranked higher than the unobserved items.

The downside of this is that its slow: since we need to rank the unobserved items, cross validation takes substantially longer than training.

@dselivanov
Copy link
Author

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants