model fit crushes with more than 2^31 positive interactions #86

MariosGr · 2018-03-08T10:14:31Z

This is probably due to some int32 index.
Could you please give me any feedback on this?

benfred · 2018-03-12T21:04:04Z

Yeah - I think the problem is that we were assuming a int32 indices/indptr being used as part of the sparse matrix passed in. When you have more than 2^31 non zero values passed in this fails.

This commit should fix 36732e3 for at least the als model. Can you pull from master and see if that fixes your problems?

MariosGr · 2018-03-13T11:03:12Z

Thanks for the response!
I am actually using the bpr model. Would it be possible to have a fix for bpr?

benfred · 2018-03-13T15:06:54Z

BPR should be fixed with this commit: 7356551 .

I haven't pushed to pypi yet, but building from master here should get that in.

MariosGr · 2018-03-14T12:51:40Z

I tried building from master but I still get a segmentation fault.
Maybe I am not building it properly. Should I do something more than
python setup.py build
python setup.py install
?

benfred · 2018-03-14T16:49:43Z

I think your build instructions should work, though I realized that I didn't check in the compiled cython file, so if you don't have cython installed it might have still picked up the old version. That should be fixed with this commit: b8c6b2b

Any chance you can get me the stack trace of where its crashing? Like on linux, run 'gdb --args python yourprogram.py' or something.

I tested it out using some random data here (most real datasets I have are at least an order of magnitude smaller), and with the latest changes it worked for me. Can you test if this crashes on your system?

import numpy
import scipy.sparse

import implicit.bpr

import logging
logging.basicConfig(level=logging.DEBUG)

# create a large sparse matrix
count = 2200000000
colids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
rowids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
vals = numpy.ones(count, dtype=numpy.float32)
m = scipy.sparse.coo_matrix((vals, (colids, rowids)), shape=(100000, 100000))

# create model using 
model = implicit.bpr.BayesianPersonalizedRanking()
model.fit(m)

benfred · 2018-03-14T17:37:13Z

I also bumped the version and pushed to pypi - I think its possible that you're still running the old version somehow (setup.py might not update properly if the version numbers are the same).

MariosGr · 2018-03-15T14:01:45Z

With version 0.3.2 I get this error:
File "implicit/bpr.pyx", line 139, in implicit.bpr.BayesianPersonalizedRanking.fit
IndexError: boolean index did not match indexed array along dimension 0; dimension is 43060797 but corresponding boolean dimension is 43060796

benfred · 2018-03-15T19:19:19Z

That last error is unrelated to the problem > 2^31 ratings being passed in.

It seems like the last error is because the ratings matrix has a user with no ratings in the last column of the input matrix, which exposed a new bug. This commit 1cb420a adds a unittest for this bug, and fixes it.

MariosGr · 2018-03-17T08:59:49Z

That was indeed unrelated to the >2^31 problem but the same data worked with version 0.3.1. With 0.3.3 seems to be working again. However I still have the segfault:

#0 0x00007fffa607f5d0 in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#1 0x00007fffa6080b3b in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#2 0x00000000004aef1c in _PyCFunction_FastCallDict (kwargs=0x7fff928bcab0, nargs=, args=0x7fff928dc3f8, func_obj=0x7fff928c5bd0) at Objects/methodobject.c:231
#3 _PyCFunction_FastCallKeywords (func=func@entry=0x7fff928c5bd0, stack=stack@entry=0x7fff928dc3f8, nargs=, kwnames=kwnames@entry=0x7fffa1ee05f8) at Objects/methodobject.c:294
#4 0x000000000054060e in call_function (pp_stack=pp_stack@entry=0x7fffffffd4b8, oparg=oparg@entry=3, kwnames=kwnames@entry=0x7fffa1ee05f8) at Python/ceval.c:4824
#5 0x00000000005422fb in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3338
#6 0x000000000053f8d1 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff928dc240) at Python/ceval.c:753
#7 _PyFunction_FastCall (co=, args=, nargs=4, globals=globals@entry=0x7fffa21aab88) at Python/ceval.c:4906
#8 0x0000000000540781 in fast_function (kwnames=0x0, nargs=, stack=, func=0x7fffa21d6ae8) at Python/ceval.c:4941
#9 call_function (pp_stack=pp_stack@entry=0x7fffffffd660, oparg=oparg@entry=3, kwnames=kwnames@entry=0x0) at Python/ceval.c:4845
#10 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#11 0x000000000053f8d1 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff928e3048) at Python/ceval.c:753
#12 _PyFunction_FastCall (co=, args=, nargs=1, globals=globals@entry=0x7fffa21aab88) at Python/ceval.c:4906
#13 0x0000000000540781 in fast_function (kwnames=0x0, nargs=, stack=, func=0x7fffa21d6a60) at Python/ceval.c:4941
#14 call_function (pp_stack=pp_stack@entry=0x7fffffffd810, oparg=oparg@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:4845
#15 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#16 0x000000000053f8d1 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff928e3230) at Python/ceval.c:753
#17 _PyFunction_FastCall (co=, args=, nargs=1, globals=globals@entry=0x7fffa31a2dc8) at Python/ceval.c:4906
#18 0x0000000000540781 in fast_function (kwnames=0x0, nargs=, stack=, func=0x7fffa21c32f0) at Python/ceval.c:4941
#19 call_function (pp_stack=pp_stack@entry=0x7fffffffd9c0, oparg=oparg@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:4845
#20 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#21 0x0000000000540275 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff98eb1bb8) at Python/ceval.c:753
#22 _PyEval_EvalCodeWithName (_co=_co@entry=0x7fffa21a90c0, globals=globals@entry=0x7fffa31a2dc8, locals=locals@entry=0x0, args=args@entry=0x7fffffffdcc0, argcount=argcount@entry=2, kwnames=kwnames@entry=0x0,
kwargs=0x0, kwcount=0, kwstep=2, defs=0x7fffa2405c20, defcount=2, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4153
#23 0x000000000054118e in PyEval_EvalCodeEx (_co=_co@entry=0x7fffa21a90c0, globals=globals@entry=0x7fffa31a2dc8, locals=locals@entry=0x0, args=args@entry=0x7fffffffdcc0, argcount=argcount@entry=2, kws=kws@entry=0x0,
kwcount=0, defs=0x7fffa2405c20, defcount=2, kwdefs=0x0, closure=0x0) at Python/ceval.c:4174
#24 0x00007fff990de2e5 in __Pyx_PyFunction_FastCallDict (func=func@entry=0x7fffa21c36a8, args=args@entry=0x7fffffffdcc0, nargs=nargs@entry=2, kwargs=0x0) at implicit/bpr.cpp:25610
#25 0x00007fff990ffd3b in __pyx_pf_8implicit_3bpr_27BayesianPersonalizedRanking_2fit (__pyx_v_self=0x7ffff6ccf2b0, __pyx_v_item_users=, __pyx_self=) at implicit/bpr.cpp:3453
#26 0x00007fff9910417e in __pyx_pw_8implicit_3bpr_27BayesianPersonalizedRanking_3fit (__pyx_self=, __pyx_args=0x7fff928c2808, __pyx_kwds=) at implicit/bpr.cpp:3151
#27 0x0000000000453412 in _PyObject_FastCallDict (func=0x7fff99462bc8, args=0x12be938, nargs=2, kwargs=0x0) at Objects/abstract.c:2331
#28 0x00000000005403d8 in call_function (pp_stack=pp_stack@entry=0x7fffffffdeb0, oparg=oparg@entry=1, kwnames=kwnames@entry=0x0) at Python/ceval.c:4848
#29 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#30 0x000000000053f8d1 in PyEval_EvalFrameEx (throwflag=0, f=0x12be7a8) at Python/ceval.c:753
#31 _PyFunction_FastCall (co=, args=, nargs=0, globals=globals@entry=0x7ffff7097168) at Python/ceval.c:4906
#32 0x0000000000540781 in fast_function (kwnames=0x0, nargs=, stack=, func=0x7ffff7081e18) at Python/ceval.c:4941
#33 call_function (pp_stack=pp_stack@entry=0x7fffffffe060, oparg=oparg@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:4845
#34 0x000000000054268d in _PyEval_EvalFrameDefault (f=, throwflag=) at Python/ceval.c:3322
#35 0x0000000000540275 in PyEval_EvalFrameEx (throwflag=0, f=0x94ff88) at Python/ceval.c:753
#36 _PyEval_EvalCodeWithName (_co=_co@entry=0x7ffff70118a0, globals=globals@entry=0x7ffff70b1078, locals=locals@entry=0x7ffff70118a0, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0,
kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4153
#37 0x0000000000541103 in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=locals@entry=0x7ffff70118a0, globals=globals@entry=0x7ffff70b1078,
_co=_co@entry=0x7ffff70118a0) at Python/ceval.c:4174
#38 PyEval_EvalCode (co=co@entry=0x7ffff70118a0, globals=globals@entry=0x7ffff7097168, locals=locals@entry=0x7ffff7097168) at Python/ceval.c:730
#39 0x000000000042777f in run_mod (arena=0x7ffff70b1078, flags=0x7fffffffe340, locals=0x7ffff7097168, globals=0x7ffff7097168, filename=0x7ffff6cc0978, mod=0x9364c0) at Python/pythonrun.c:1025
#40 PyRun_FileExFlags (fp=0x98b940, filename_str=, start=, globals=0x7ffff7097168, locals=0x7ffff7097168, closeit=1, flags=0x7fffffffe340) at Python/pythonrun.c:978
#41 0x00000000004279ac in PyRun_SimpleFileExFlags (fp=0x98b940, filename=, closeit=1, flags=0x7fffffffe340) at Python/pythonrun.c:420
#42 0x000000000043be55 in run_file (p_cf=0x7fffffffe340, filename=0x8fb2a0 L"factorization.py", fp=0x98b940) at Modules/main.c:338
#43 Py_Main (argc=argc@entry=2, argv=argv@entry=0x8fa010) at Modules/main.c:809
#44 0x000000000041dd42 in main (argc=2, argv=) at ./Programs/python.c:69

benfred · 2018-03-23T15:47:17Z

Interesting! thanks for sending that.

If I'm reading that right, it looks like its crashing in numpy code, which is being called from line 3453 in bpr.cpp

#0 0x00007fffa607f5d0 in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#1 0x00007fffa6080b3b in ?? () from /usr/local/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so

...

#25 0x00007fff990ffd3b in __pyx_pf_8implicit_3bpr_27BayesianPersonalizedRanking_2fit (__pyx_v_self=0x7ffff6ccf2b0, __pyx_v_item_users=, __pyx_self=) at implicit/bpr.cpp:3453

Looking at bpr.cpp just before line 3453, it seems like its getting caused by

    /* "implicit/bpr.pyx":121
 *         # for now, all we handle is float 32 values
 *         if Ciu.dtype != np.float32:
 *             Ciu = Ciu.astype(np.float32)             # <<<<<<<<<<<<<<
 * 
 *         # initialize factors

Can you try calling
Ciu = Ciu.astype(np.float32) on the matrix you pass in? It seems like this might be an out of memory type condition

MariosGr · 2018-03-28T16:00:10Z

I tried it and it seems to be working fine, without consuming too much of my memory.

MariosGr · 2018-04-04T09:34:56Z

OK, so I tried casting the matrix to np.float32 and then pass it to fit, and it works!
The only slight problem now is that the DEBUG message prints a wrong/negative percentage of correctly classified training samples, while training. I guess this is again due to an index overflow.

benfred · 2018-04-07T22:38:40Z

Awesome! Glad it works.

Pushed a fix in v0.3.4 for the logging message - dfb62a4

MariosGr · 2018-04-10T07:56:41Z

Thanks!
Great job!

MariosGr · 2018-05-03T14:00:32Z

Now I am trying to train the AlternatingLeastSquares model with the same data (more than 2^31 samples) and it seems it is not working properly. It has been running for days without completing an iteration. When the samples are (a bit) less than 2^31 an iteration takes just a few minutes.

benfred · 2020-09-18T16:41:43Z

@MariosGr There is a fix for the ALS model here https://github.com/benfred/implicit/pull/400/files ( sorry for the late reply )

murphystout · 2021-01-27T16:57:11Z

import numpy
import scipy.sparse

import implicit.bpr

import logging
logging.basicConfig(level=logging.DEBUG)

create a large sparse matrix

count = 2200000000
colids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
rowids = numpy.random.randint(100000, size=count, dtype=numpy.int32)
vals = numpy.ones(count, dtype=numpy.float32)
m = scipy.sparse.coo_matrix((vals, (colids, rowids)), shape=(100000, 100000))

create model using

model = implicit.bpr.BayesianPersonalizedRanking()
model.fit(m)

Hey @benfred - An FYI I ran your example code and I'm getting the "Buffer dtype mismatch, expected 'int' but got 'long':

ValueError Traceback (most recent call last)
in
18 # create model using
19 model = implicit.bpr.BayesianPersonalizedRanking()
---> 20 model.fit(m)

implicit/bpr.pyx in implicit.bpr.BayesianPersonalizedRanking.fit()

implicit/bpr.pyx in implicit.bpr.BayesianPersonalizedRanking._fit_gpu()

implicit/cuda/_cuda.pyx in implicit.cuda._cuda.CuIntVector.cinit()

ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

This is an issue over here for nearest neighbors: #360, where you recommend reducing K, but just wanted to raise the issue that it's also popping up with BPR.

benfred closed this as completed Apr 10, 2018

nishanthrao24 mentioned this issue Jul 28, 2020

model fit stuck on more than 2^31 elements in rating matrix #380

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model fit crushes with more than 2^31 positive interactions #86

model fit crushes with more than 2^31 positive interactions #86

MariosGr commented Mar 8, 2018

benfred commented Mar 12, 2018

MariosGr commented Mar 13, 2018

benfred commented Mar 13, 2018

MariosGr commented Mar 14, 2018

benfred commented Mar 14, 2018

benfred commented Mar 14, 2018

MariosGr commented Mar 15, 2018

benfred commented Mar 15, 2018

MariosGr commented Mar 17, 2018 •

edited

Loading

benfred commented Mar 23, 2018

MariosGr commented Mar 28, 2018

MariosGr commented Apr 4, 2018

benfred commented Apr 7, 2018

MariosGr commented Apr 10, 2018

MariosGr commented May 3, 2018

benfred commented Sep 18, 2020

murphystout commented Jan 27, 2021 •

edited

Loading

create a large sparse matrix

create model using

model fit crushes with more than 2^31 positive interactions #86

model fit crushes with more than 2^31 positive interactions #86

Comments

MariosGr commented Mar 8, 2018

benfred commented Mar 12, 2018

MariosGr commented Mar 13, 2018

benfred commented Mar 13, 2018

MariosGr commented Mar 14, 2018

benfred commented Mar 14, 2018

benfred commented Mar 14, 2018

MariosGr commented Mar 15, 2018

benfred commented Mar 15, 2018

MariosGr commented Mar 17, 2018 • edited Loading

benfred commented Mar 23, 2018

MariosGr commented Mar 28, 2018

MariosGr commented Apr 4, 2018

benfred commented Apr 7, 2018

MariosGr commented Apr 10, 2018

MariosGr commented May 3, 2018

benfred commented Sep 18, 2020

murphystout commented Jan 27, 2021 • edited Loading

create a large sparse matrix

create model using

MariosGr commented Mar 17, 2018 •

edited

Loading

murphystout commented Jan 27, 2021 •

edited

Loading