-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model fit crushes with more than 2^31 positive interactions #86
Comments
Yeah - I think the problem is that we were assuming a int32 indices/indptr being used as part of the sparse matrix passed in. When you have more than 2^31 non zero values passed in this fails. This commit should fix 36732e3 for at least the als model. Can you pull from master and see if that fixes your problems? |
Thanks for the response! |
BPR should be fixed with this commit: 7356551 . I haven't pushed to pypi yet, but building from master here should get that in. |
I tried building from master but I still get a segmentation fault. |
I think your build instructions should work, though I realized that I didn't check in the compiled cython file, so if you don't have cython installed it might have still picked up the old version. That should be fixed with this commit: b8c6b2b Any chance you can get me the stack trace of where its crashing? Like on linux, run 'gdb --args python yourprogram.py' or something. I tested it out using some random data here (most real datasets I have are at least an order of magnitude smaller), and with the latest changes it worked for me. Can you test if this crashes on your system? import numpy
import scipy.sparse
import implicit.bpr
import logging
logging.basicConfig(level=logging.DEBUG)
# create a large sparse matrix
count = 2200000000
colids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
rowids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
vals = numpy.ones(count, dtype=numpy.float32)
m = scipy.sparse.coo_matrix((vals, (colids, rowids)), shape=(100000, 100000))
# create model using
model = implicit.bpr.BayesianPersonalizedRanking()
model.fit(m) |
I also bumped the version and pushed to pypi - I think its possible that you're still running the old version somehow (setup.py might not update properly if the version numbers are the same). |
With version 0.3.2 I get this error: |
That last error is unrelated to the problem > 2^31 ratings being passed in. It seems like the last error is because the ratings matrix has a user with no ratings in the last column of the input matrix, which exposed a new bug. This commit 1cb420a adds a unittest for this bug, and fixes it. |
That was indeed unrelated to the >2^31 problem but the same data worked with version 0.3.1. With 0.3.3 seems to be working again. However I still have the segfault: #0 0x00007fffa607f5d0 in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so |
Interesting! thanks for sending that. If I'm reading that right, it looks like its crashing in numpy code, which is being called from line 3453 in bpr.cpp
Looking at bpr.cpp just before line 3453, it seems like its getting caused by
Can you try calling |
I tried it and it seems to be working fine, without consuming too much of my memory. |
OK, so I tried casting the matrix to np.float32 and then pass it to fit, and it works! |
Awesome! Glad it works. Pushed a fix in v0.3.4 for the logging message - dfb62a4 |
Thanks! |
Now I am trying to train the AlternatingLeastSquares model with the same data (more than 2^31 samples) and it seems it is not working properly. It has been running for days without completing an iteration. When the samples are (a bit) less than 2^31 an iteration takes just a few minutes. |
@MariosGr There is a fix for the ALS model here https://github.com/benfred/implicit/pull/400/files ( sorry for the late reply ) |
Hey @benfred - An FYI I ran your example code and I'm getting the "Buffer dtype mismatch, expected 'int' but got 'long': ValueError Traceback (most recent call last) implicit/bpr.pyx in implicit.bpr.BayesianPersonalizedRanking.fit() implicit/bpr.pyx in implicit.bpr.BayesianPersonalizedRanking._fit_gpu() implicit/cuda/_cuda.pyx in implicit.cuda._cuda.CuIntVector.cinit() ValueError: Buffer dtype mismatch, expected 'int' but got 'long' This is an issue over here for nearest neighbors: #360, where you recommend reducing K, but just wanted to raise the issue that it's also popping up with BPR. |
This is probably due to some int32 index.
Could you please give me any feedback on this?
The text was updated successfully, but these errors were encountered: