-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] loss calculation and cross-validation #25
Comments
The loss function is calculated exactly - and only does work on observed items =). So the loss is: Cu(Y^TXu - Pu) ^ 2 summed over all users , expanding it out is: Y^TXu^TCuXu - 2 Y^TXuCuPu + Pu^TCuPu Setting A = YtCuY and b = Y^TCuPu, the first two terms can be rewritten as (A Xu - 2b) Xu - which is what I'm calculating in that function. For calculating b, I'm assuming that Pu is 1 for all non-zero items in the input matrix, and likewise for Pu^TCuPu I'm just summing the confidence. To calculate A, I'm using that YtCuY is equal to YtY + Yt(Cu - I) Y . This is useful because Cu - I is sparse, and we can precompute YtY for all users. This means we can calculate the loss function on all item/user pairs but only do work on the observed nonzero items. The final trick here is that I don't actually calculate A - I'm calculating A Xu directly, which is slightly faster. |
Thanks! One more question - can you describe logic for denominator at last row? Why not just number of non zeros? |
For the last row, I wanted to normalize by the total confidence , including the confidence of the missing entries. The negative missing entries have a confidence of 1 in this model, so are in total Cui.shape[0] * Cui.shape[1] - Cui.nnz. |
Yes, I've understood that. I'm developing R package and use similar approach to calculate loss, but divide only by number of non zeros (since we calculate loss only for observed interactions). Speed and results match yours more or less. The problem is how to perform cross-validation between models, because obviously losses for models with different alphas will be very different. (I believe that your rational behind dividing by total confidence is to compare loss between models with different What is your flow for cross-validation? |
So - I'm calculating the loss for both observed (positive) and unobserved (negative) interactions in that function. Calculating the loss for only the positive items doesn't make much sense: the model could predict a constant 1 for every user/item pair and the loss would be 0 if you only calculate on the positive items. So you need to calculate the loss on both observed/unobserved items. Since the loss function can be computed quickly (using same trick as in computing gradient), I used it to prove convergence with the CG code. Its not actually all that useful as a metric for doing cross validation though: what we really care about then is a metric that works on the ranking of the items returned. I like using something like MAP or MRR (mean recipricol rank) to measure that the withheld positive items are ranked higher than the unobserved items. The downside of this is that its slow: since we need to rank the unobserved items, cross validation takes substantially longer than training. |
thanks! |
Hi Ben. Thanks for library and especially for great posts. I'm wondering what is the procedure for loss calculation? I checked code here, but didn't understand exact algorithm. I suppose it is kind of approximation for loss from paper, isn't it? I think it is almost infeasible(very computationally expensive) to calculate exact loss from paper, because it will require to calculate prediction for each users and each item matrix in order to take into account loss for not observed items. Or I missed some trick?
The text was updated successfully, but these errors were encountered: