Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError on Strings containing Certain Characters #10

Closed
matt-buckley opened this issue Apr 28, 2022 · 5 comments
Closed

IndexError on Strings containing Certain Characters #10

matt-buckley opened this issue Apr 28, 2022 · 5 comments

Comments

@matt-buckley
Copy link

When running a basic NLP model like en_core_web_lg with the sole addition of an entityLinker pipe, calling nlp() will throw an IndexError on certain strings, particularly those with certain whitespace characters such as newline characters. The error thrown and the line causing the error is:

`def get_candidates_in_sent(self, sent, doc):
----> root = list(filter(lambda token: token.dep
== "ROOT", sent))[0]
excluded_children = []
candidates = []

IndexError: list index out of range`

I'm running Python version 3.9, spaCy version 3.2.4, and spaCy-entity-linker version 1.0.1

@ninikolov
Copy link

Same here

@isu-shrestha
Copy link

isu-shrestha commented Jan 24, 2023

Having the same problem here.
Did a temporary fix by removing white space:
text = ' '.join(text.split())

@MartinoMensio
Copy link
Collaborator

Hi @matt-buckley @ninikolov and @isu-shrestha ,
Thank you for opening the issue. I recently became a maintainer of this package and did not notice the open issue.
I just tested and merged #9 which should fix the issue. Can you confirm on your end?

Best,
Martino

@dennlinger
Copy link
Contributor

Hi @MartinoMensio,
I also encountered this issue not too long ago (tried it two weeks ago and it failed). For my particular file, it now works! Thanks @jonwiggins for the fix 🥳

I'll contribute a PR with an additional test case, containing a minimal document sample that caused a crash. This way future iterations have a better checking and you can reproduce the issue yourself.

@MartinoMensio
Copy link
Collaborator

Hi @dennlinger,
Thank you very much for joining this issue and for confirming that now it works!
I'm looking forward to receiving your PR with the test case! This project needs it :)

Best,
Martino

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants