Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: negative row index found on input #4

Closed
dantlz opened this issue Jul 14, 2016 · 4 comments
Closed

ValueError: negative row index found on input #4

dantlz opened this issue Jul 14, 2016 · 4 comments

Comments

@dantlz
Copy link

dantlz commented Jul 14, 2016

When I run the attached input, I get the following input:

Traceback (most recent call last):
File "/Users/username/Desktop/Recommendation/Implementation.py", line 206, in
collaborative_filter(formatted, result)
File "/Users/username/Desktop/Recommendation/Implementation.py", line 80, in
collaborative_filter
df, plays = read_data(input_filename)
File "/Users/username/Desktop/Recommendation/Implementation.py", line 25, in read_data
data['user'].cat.codes.copy())))
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/coo.py", line 182, in init
self._check()
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/coo.py", line 240, in _check
raise ValueError('negative row index found')
ValueError: negative row index found

From what I can tell, the input is correctly formatted with 3 columns separated by tabs. Thank you for your time!
faulty_input.txt

@benfred
Copy link
Owner

benfred commented Jul 14, 2016

So - it seems like this row '4a81291db77648b0 nan 1' is tripping up the pandas read_table parser.
Its interpret the artist there as a floating point NaN instead of a string, which causes the category code to fail etc.

Looks like this is by design in the pandas.read_table function , adding a 'na_filter=False' to the argument list bypasses the NaN check and should work

    data = pandas.read_table(filename,
                             usecols=[0, 1, 2],
                             names=['user', 'artist', 'plays'],
                             na_filter=False)

@dantlz
Copy link
Author

dantlz commented Jul 15, 2016

That completely resolved the issue. Thank you very much for the help!

@dantlz dantlz closed this as completed Jul 15, 2016
@eliasah
Copy link

eliasah commented Mar 30, 2017

This is issue also affects the code from your distance-metrics project.

@benfred
Copy link
Owner

benfred commented Mar 30, 2017

@eliasah I've added nearest neighbour support to this project recently: #14 . It should be better than the code I included with the original blog post - calculation is parallelized and won't run out of memory if the full similarity matrix is large.
The lastfm.example here shows how to use:https://github.com/benfred/implicit/blob/master/examples/lastfm.py

I'll update that post/code to point here sometime soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants