Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model fit crushes with more than 2^31 positive interactions #86

Closed
MariosGr opened this issue Mar 8, 2018 · 17 comments
Closed

model fit crushes with more than 2^31 positive interactions #86

MariosGr opened this issue Mar 8, 2018 · 17 comments

Comments

@MariosGr
Copy link

MariosGr commented Mar 8, 2018

This is probably due to some int32 index.
Could you please give me any feedback on this?

@benfred
Copy link
Owner

benfred commented Mar 12, 2018

Yeah - I think the problem is that we were assuming a int32 indices/indptr being used as part of the sparse matrix passed in. When you have more than 2^31 non zero values passed in this fails.

This commit should fix 36732e3 for at least the als model. Can you pull from master and see if that fixes your problems?

@MariosGr
Copy link
Author

Thanks for the response!
I am actually using the bpr model. Would it be possible to have a fix for bpr?

@benfred
Copy link
Owner

benfred commented Mar 13, 2018

BPR should be fixed with this commit: 7356551 .

I haven't pushed to pypi yet, but building from master here should get that in.

@MariosGr
Copy link
Author

I tried building from master but I still get a segmentation fault.
Maybe I am not building it properly. Should I do something more than
python setup.py build
python setup.py install
?

@benfred
Copy link
Owner

benfred commented Mar 14, 2018

I think your build instructions should work, though I realized that I didn't check in the compiled cython file, so if you don't have cython installed it might have still picked up the old version. That should be fixed with this commit: b8c6b2b

Any chance you can get me the stack trace of where its crashing? Like on linux, run 'gdb --args python yourprogram.py' or something.

I tested it out using some random data here (most real datasets I have are at least an order of magnitude smaller), and with the latest changes it worked for me. Can you test if this crashes on your system?

import numpy
import scipy.sparse

import implicit.bpr

import logging
logging.basicConfig(level=logging.DEBUG)

# create a large sparse matrix
count = 2200000000
colids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
rowids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
vals = numpy.ones(count, dtype=numpy.float32)
m = scipy.sparse.coo_matrix((vals, (colids, rowids)), shape=(100000, 100000))

# create model using 
model = implicit.bpr.BayesianPersonalizedRanking()
model.fit(m)

@benfred
Copy link
Owner

benfred commented Mar 14, 2018

I also bumped the version and pushed to pypi - I think its possible that you're still running the old version somehow (setup.py might not update properly if the version numbers are the same).

@MariosGr
Copy link
Author

With version 0.3.2 I get this error:
File "implicit/bpr.pyx", line 139, in implicit.bpr.BayesianPersonalizedRanking.fit
IndexError: boolean index did not match indexed array along dimension 0; dimension is 43060797 but corresponding boolean dimension is 43060796

@benfred
Copy link
Owner

benfred commented Mar 15, 2018

That last error is unrelated to the problem > 2^31 ratings being passed in.

It seems like the last error is because the ratings matrix has a user with no ratings in the last column of the input matrix, which exposed a new bug. This commit 1cb420a adds a unittest for this bug, and fixes it.

@MariosGr
Copy link
Author

MariosGr commented Mar 17, 2018

That was indeed unrelated to the >2^31 problem but the same data worked with version 0.3.1. With 0.3.3 seems to be working again. However I still have the segfault:

#0 0x00007fffa607f5d0 in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#1 0x00007fffa6080b3b in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#2 0x00000000004aef1c in _PyCFunction_FastCallDict (kwargs=0x7fff928bcab0, nargs=, args=0x7fff928dc3f8, func_obj=0x7fff928c5bd0) at Objects/methodobject.c:231
#3 _PyCFunction_FastCallKeywords (func=func@entry=0x7fff928c5bd0, stack=stack@entry=0x7fff928dc3f8, nargs=, kwnames=kwnames@entry=0x7fffa1ee05f8) at Objects/methodobject.c:294
#4 0x000000000054060e in call_function (pp_stack=pp_stack@entry=0x7fffffffd4b8, oparg=oparg@entry=3, kwnames=kwnames@entry=0x7fffa1ee05f8) at Python/ceval.c:4824
#5 0x00000000005422fb in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3338
#6 0x000000000053f8d1 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff928dc240) at Python/ceval.c:753
#7 _PyFunction_FastCall (co=, args=, nargs=4, globals=globals@entry=0x7fffa21aab88) at Python/ceval.c:4906
#8 0x0000000000540781 in fast_function (kwnames=0x0, nargs=, stack=, func=0x7fffa21d6ae8) at Python/ceval.c:4941
#9 call_function (pp_stack=pp_stack@entry=0x7fffffffd660, oparg=oparg@entry=3, kwnames=kwnames@entry=0x0) at Python/ceval.c:4845
#10 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#11 0x000000000053f8d1 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff928e3048) at Python/ceval.c:753
#12 _PyFunction_FastCall (co=, args=, nargs=1, globals=globals@entry=0x7fffa21aab88) at Python/ceval.c:4906
#13 0x0000000000540781 in fast_function (kwnames=0x0, nargs=, stack=, func=0x7fffa21d6a60) at Python/ceval.c:4941
#14 call_function (pp_stack=pp_stack@entry=0x7fffffffd810, oparg=oparg@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:4845
#15 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#16 0x000000000053f8d1 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff928e3230) at Python/ceval.c:753
#17 _PyFunction_FastCall (co=, args=, nargs=1, globals=globals@entry=0x7fffa31a2dc8) at Python/ceval.c:4906
#18 0x0000000000540781 in fast_function (kwnames=0x0, nargs=, stack=, func=0x7fffa21c32f0) at Python/ceval.c:4941
#19 call_function (pp_stack=pp_stack@entry=0x7fffffffd9c0, oparg=oparg@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:4845
#20 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#21 0x0000000000540275 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff98eb1bb8) at Python/ceval.c:753
#22 _PyEval_EvalCodeWithName (_co=_co@entry=0x7fffa21a90c0, globals=globals@entry=0x7fffa31a2dc8, locals=locals@entry=0x0, args=args@entry=0x7fffffffdcc0, argcount=argcount@entry=2, kwnames=kwnames@entry=0x0,
kwargs=0x0, kwcount=0, kwstep=2, defs=0x7fffa2405c20, defcount=2, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4153
#23 0x000000000054118e in PyEval_EvalCodeEx (_co=_co@entry=0x7fffa21a90c0, globals=globals@entry=0x7fffa31a2dc8, locals=locals@entry=0x0, args=args@entry=0x7fffffffdcc0, argcount=argcount@entry=2, kws=kws@entry=0x0,
kwcount=0, defs=0x7fffa2405c20, defcount=2, kwdefs=0x0, closure=0x0) at Python/ceval.c:4174
#24 0x00007fff990de2e5 in __Pyx_PyFunction_FastCallDict (func=func@entry=0x7fffa21c36a8, args=args@entry=0x7fffffffdcc0, nargs=nargs@entry=2, kwargs=0x0) at implicit/bpr.cpp:25610
#25 0x00007fff990ffd3b in __pyx_pf_8implicit_3bpr_27BayesianPersonalizedRanking_2fit (__pyx_v_self=0x7ffff6ccf2b0, __pyx_v_item_users=, __pyx_self=) at implicit/bpr.cpp:3453
#26 0x00007fff9910417e in __pyx_pw_8implicit_3bpr_27BayesianPersonalizedRanking_3fit (__pyx_self=, __pyx_args=0x7fff928c2808, __pyx_kwds=) at implicit/bpr.cpp:3151
#27 0x0000000000453412 in _PyObject_FastCallDict (func=0x7fff99462bc8, args=0x12be938, nargs=2, kwargs=0x0) at Objects/abstract.c:2331
#28 0x00000000005403d8 in call_function (pp_stack=pp_stack@entry=0x7fffffffdeb0, oparg=oparg@entry=1, kwnames=kwnames@entry=0x0) at Python/ceval.c:4848
#29 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#30 0x000000000053f8d1 in PyEval_EvalFrameEx (throwflag=0, f=0x12be7a8) at Python/ceval.c:753
#31 _PyFunction_FastCall (co=, args=, nargs=0, globals=globals@entry=0x7ffff7097168) at Python/ceval.c:4906
#32 0x0000000000540781 in fast_function (kwnames=0x0, nargs=, stack=, func=0x7ffff7081e18) at Python/ceval.c:4941
#33 call_function (pp_stack=pp_stack@entry=0x7fffffffe060, oparg=oparg@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:4845
#34 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#35 0x0000000000540275 in PyEval_EvalFrameEx (throwflag=0, f=0x94ff88) at Python/ceval.c:753
#36 _PyEval_EvalCodeWithName (_co=_co@entry=0x7ffff70118a0, globals=globals@entry=0x7ffff70b1078, locals=locals@entry=0x7ffff70118a0, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0,
kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4153
#37 0x0000000000541103 in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=locals@entry=0x7ffff70118a0, globals=globals@entry=0x7ffff70b1078,
_co=_co@entry=0x7ffff70118a0) at Python/ceval.c:4174
#38 PyEval_EvalCode (co=co@entry=0x7ffff70118a0, globals=globals@entry=0x7ffff7097168, locals=locals@entry=0x7ffff7097168) at Python/ceval.c:730
#39 0x000000000042777f in run_mod (arena=0x7ffff70b1078, flags=0x7fffffffe340, locals=0x7ffff7097168, globals=0x7ffff7097168, filename=0x7ffff6cc0978, mod=0x9364c0) at Python/pythonrun.c:1025
#40 PyRun_FileExFlags (fp=0x98b940, filename_str=, start=, globals=0x7ffff7097168, locals=0x7ffff7097168, closeit=1, flags=0x7fffffffe340) at Python/pythonrun.c:978
#41 0x00000000004279ac in PyRun_SimpleFileExFlags (fp=0x98b940, filename=, closeit=1, flags=0x7fffffffe340) at Python/pythonrun.c:420
#42 0x000000000043be55 in run_file (p_cf=0x7fffffffe340, filename=0x8fb2a0 L"factorization.py", fp=0x98b940) at Modules/main.c:338
#43 Py_Main (argc=argc@entry=2, argv=argv@entry=0x8fa010) at Modules/main.c:809
#44 0x000000000041dd42 in main (argc=2, argv=) at ./Programs/python.c:69

@benfred
Copy link
Owner

benfred commented Mar 23, 2018

Interesting! thanks for sending that.

If I'm reading that right, it looks like its crashing in numpy code, which is being called from line 3453 in bpr.cpp

#0 0x00007fffa607f5d0 in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#1 0x00007fffa6080b3b in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so

...

#25 0x00007fff990ffd3b in __pyx_pf_8implicit_3bpr_27BayesianPersonalizedRanking_2fit (__pyx_v_self=0x7ffff6ccf2b0, __pyx_v_item_users=, __pyx_self=) at implicit/bpr.cpp:3453

Looking at bpr.cpp just before line 3453, it seems like its getting caused by

    /* "implicit/bpr.pyx":121
 *         # for now, all we handle is float 32 values
 *         if Ciu.dtype != np.float32:
 *             Ciu = Ciu.astype(np.float32)             # <<<<<<<<<<<<<<
 * 
 *         # initialize factors

Can you try calling
Ciu = Ciu.astype(np.float32) on the matrix you pass in? It seems like this might be an out of memory type condition

@MariosGr
Copy link
Author

I tried it and it seems to be working fine, without consuming too much of my memory.

@MariosGr
Copy link
Author

MariosGr commented Apr 4, 2018

OK, so I tried casting the matrix to np.float32 and then pass it to fit, and it works!
The only slight problem now is that the DEBUG message prints a wrong/negative percentage of correctly classified training samples, while training. I guess this is again due to an index overflow.

@benfred
Copy link
Owner

benfred commented Apr 7, 2018

Awesome! Glad it works.

Pushed a fix in v0.3.4 for the logging message - dfb62a4

@MariosGr
Copy link
Author

Thanks!
Great job!

@benfred benfred closed this as completed Apr 10, 2018
@MariosGr
Copy link
Author

MariosGr commented May 3, 2018

Now I am trying to train the AlternatingLeastSquares model with the same data (more than 2^31 samples) and it seems it is not working properly. It has been running for days without completing an iteration. When the samples are (a bit) less than 2^31 an iteration takes just a few minutes.

@benfred
Copy link
Owner

benfred commented Sep 18, 2020

@MariosGr There is a fix for the ALS model here https://github.com/benfred/implicit/pull/400/files ( sorry for the late reply )

@murphystout
Copy link

murphystout commented Jan 27, 2021

import numpy
import scipy.sparse

import implicit.bpr

import logging
logging.basicConfig(level=logging.DEBUG)

create a large sparse matrix

count = 2200000000
colids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
rowids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
vals = numpy.ones(count, dtype=numpy.float32)
m = scipy.sparse.coo_matrix((vals, (colids, rowids)), shape=(100000, 100000))

create model using

model = implicit.bpr.BayesianPersonalizedRanking()
model.fit(m)

Hey @benfred - An FYI I ran your example code and I'm getting the "Buffer dtype mismatch, expected 'int' but got 'long':

ValueError Traceback (most recent call last)
in
18 # create model using
19 model = implicit.bpr.BayesianPersonalizedRanking()
---> 20 model.fit(m)

implicit/bpr.pyx in implicit.bpr.BayesianPersonalizedRanking.fit()

implicit/bpr.pyx in implicit.bpr.BayesianPersonalizedRanking._fit_gpu()

implicit/cuda/_cuda.pyx in implicit.cuda._cuda.CuIntVector.cinit()

ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

This is an issue over here for nearest neighbors: #360, where you recommend reducing K, but just wanted to raise the issue that it's also popping up with BPR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants