[question] loss calculation and cross-validation #25

dselivanov · 2017-05-08T09:51:16Z

Hi Ben. Thanks for library and especially for great posts. I'm wondering what is the procedure for loss calculation? I checked code here, but didn't understand exact algorithm. I suppose it is kind of approximation for loss from paper, isn't it? I think it is almost infeasible(very computationally expensive) to calculate exact loss from paper, because it will require to calculate prediction for each users and each item matrix in order to take into account loss for not observed items. Or I missed some trick?

benfred · 2017-05-08T15:25:42Z

The loss function is calculated exactly - and only does work on observed items =).

So the loss is: Cu(Y^TXu - Pu) ^ 2 summed over all users , expanding it out is: Y^TXu^TCuXu - 2 Y^TXuCuPu + Pu^TCuPu

Setting A = YtCuY and b = Y^TCuPu, the first two terms can be rewritten as (A Xu - 2b) Xu - which is what I'm calculating in that function. For calculating b, I'm assuming that Pu is 1 for all non-zero items in the input matrix, and likewise for Pu^TCuPu I'm just summing the confidence.

To calculate A, I'm using that YtCuY is equal to YtY + Yt(Cu - I) Y . This is useful because Cu - I is sparse, and we can precompute YtY for all users. This means we can calculate the loss function on all item/user pairs but only do work on the observed nonzero items.

The final trick here is that I don't actually calculate A - I'm calculating A Xu directly, which is slightly faster.

dselivanov · 2017-05-08T15:35:16Z

Thanks! One more question - can you describe logic for denominator at last row? Why not just number of non zeros?

benfred · 2017-05-08T15:49:49Z

For the last row, I wanted to normalize by the total confidence , including the confidence of the missing entries.

The negative missing entries have a confidence of 1 in this model, so are in total Cui.shape[0] * Cui.shape[1] - Cui.nnz.

dselivanov · 2017-05-09T06:31:12Z

Yes, I've understood that. I'm developing R package and use similar approach to calculate loss, but divide only by number of non zeros (since we calculate loss only for observed interactions). Speed and results match yours more or less.

The problem is how to perform cross-validation between models, because obviously losses for models with different alphas will be very different. (I believe that your rational behind dividing by total confidence is to compare loss between models with different alpha, but I'm not sure this will work). What I ended with - is just use loss as convergence indicator, but calculate ndcg@k, map@k for cross-validation. So models with different hyper parameters could be compared. The interesting thing is that if we will use values in confidence matrix as relevance for ndcg@k we will end up in the same trap - ndcg will also depend on alpha. So I use not-transformed user-item interaction matrix as hold-out set for calculation of ndcg (so relevance is the same across models with different alphas).

What is your flow for cross-validation?

benfred · 2017-05-09T17:40:50Z

So - I'm calculating the loss for both observed (positive) and unobserved (negative) interactions in that function. Calculating the loss for only the positive items doesn't make much sense: the model could predict a constant 1 for every user/item pair and the loss would be 0 if you only calculate on the positive items. So you need to calculate the loss on both observed/unobserved items.

Since the loss function can be computed quickly (using same trick as in computing gradient), I used it to prove convergence with the CG code. Its not actually all that useful as a metric for doing cross validation though: what we really care about then is a metric that works on the ranking of the items returned. I like using something like MAP or MRR (mean recipricol rank) to measure that the withheld positive items are ranked higher than the unobserved items.

The downside of this is that its slow: since we need to rank the unobserved items, cross validation takes substantially longer than training.

dselivanov · 2017-05-12T13:12:55Z

thanks!

dselivanov changed the title ~~[question] loss calculation~~ [question] loss calculation and cross-validation May 9, 2017

dselivanov closed this as completed May 12, 2017

MariosGr mentioned this issue Mar 17, 2018

model fit crushes with more than 2^31 positive interactions #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] loss calculation and cross-validation #25

[question] loss calculation and cross-validation #25

dselivanov commented May 8, 2017

benfred commented May 8, 2017 •

edited

Loading

dselivanov commented May 8, 2017

benfred commented May 8, 2017

dselivanov commented May 9, 2017 •

edited

Loading

benfred commented May 9, 2017

dselivanov commented May 12, 2017

[question] loss calculation and cross-validation #25

[question] loss calculation and cross-validation #25

Comments

dselivanov commented May 8, 2017

benfred commented May 8, 2017 • edited Loading

dselivanov commented May 8, 2017

benfred commented May 8, 2017

dselivanov commented May 9, 2017 • edited Loading

benfred commented May 9, 2017

dselivanov commented May 12, 2017

benfred commented May 8, 2017 •

edited

Loading

dselivanov commented May 9, 2017 •

edited

Loading