Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_oov and prob does not work for en_core_web_sm model #3986

Closed
lkluo opened this issue Jul 18, 2019 · 2 comments
Closed

is_oov and prob does not work for en_core_web_sm model #3986

lkluo opened this issue Jul 18, 2019 · 2 comments
Labels
usage General spaCy usage

Comments

@lkluo
Copy link

lkluo commented Jul 18, 2019

How to reproduce the behaviour

lex = nlp.vocab[u"dog"] print(lex.is_oov, lex.prob)
producing
True -20.0

This bug had been reported #1204.

Your Environment

  • Operating System: masOS
  • Python Version Used: 3.6
  • spaCy Version Used: 2.1.4
  • Environment Information:
@BreakBB
Copy link
Contributor

BreakBB commented Jul 18, 2019

"dog" is really not included in "en_core_web_sm" as you can see if you iterate the StringStore:

import spacy
nlp = spacy.load("en_core_web_sm")

has_dog = False
for s in nlp.vocab.strings:
  if s == "dog":
    has_dog = True
print("has_dog:", str(has_dog))  # has_dog: False

or simply use __contains__:

import spacy
nlp = spacy.load("en_core_web_sm")

if u"dog" in nlp.vocab.strings:
  print("'dog' is in the StringStore")
else:
  print("'dog' is not in the StringStore")

Therefore is_oov has to be True.

@lock
Copy link

lock bot commented Aug 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

4 participants