Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(*): change installation of spacy weights to runtime #462

Merged
merged 10 commits into from
Aug 17, 2023

Conversation

KrishPatel13
Copy link
Collaborator

@KrishPatel13 KrishPatel13 commented Aug 10, 2023

What kind of change does this PR introduce?

feature

Summary
Motivation: #461

Checklist

  • My code follows the style guidelines of OpenAdapt
  • I have performed a self-review of my code
  • If applicable, I have added tests to prove my fix is functional/effective
  • I have linted my code locally prior to submission
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. README.md, requirements.txt)
  • New and existing unit tests pass locally with my changes

How can your code be run and tested?

pip uninstall `en_core_web_trf`
# Follow Setup Instructions in README.md
pytest -s

Other information

Krish Patel added 2 commits August 10, 2023 13:34
 manual steup instruciton in README.md
 installion from both the install scripts
@KrishPatel13 KrishPatel13 self-assigned this Aug 10, 2023
@KrishPatel13 KrishPatel13 marked this pull request as draft August 10, 2023 17:37
@KrishPatel13 KrishPatel13 changed the title feat: install spacy in runtime feat(*): change installation of spacy weights to runtime Aug 10, 2023
@KrishPatel13
Copy link
Collaborator Author

This is how pytest will look like when we do not have the required spacy weight model installed:

(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt> pytest
============================================================================== test session starts ==============================================================================
platform win32 -- Python 3.10.11, pytest-7.1.3, pluggy-1.2.0
rootdir: P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt
plugins: anyio-3.7.1, Faker-19.2.0, cov-2.10.0
collected 10 items / 1 error

==================================================================================== ERRORS =====================================================================================
________________________________________________________________ ERROR collecting tests/openadapt/test_scrub.py _________________________________________________________________
tests\openadapt\test_scrub.py:9: in <module>
    from openadapt import config, scrub
openadapt\scrub.py:25: in <module>
    NLP_ENGINE_TRF = SCRUB_PROVIDER_TRF.create_engine()
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\presidio_analyzer\nlp_engine\nlp_engine_provider.py:91: in create_engine
    engine = nlp_engine_class(nlp_engine_opts)
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\presidio_analyzer\nlp_engine\spacy_nlp_engine.py:36: in __init__        
    self.nlp = {
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\presidio_analyzer\nlp_engine\spacy_nlp_engine.py:37: in <dictcomp>      
    lang_code: spacy.load(model_name, disable=["parser"])
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\spacy\__init__.py:51: in load
    return util.load_model(
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\spacy\util.py:472: in load_model
    raise IOError(Errors.E050.format(name=name))
E   OSError: [E050] Can't find model 'en_core_web_trf'. It doesn't seem to be a Python package or a valid path to a data directory.
=============================================================================== warnings summary ================================================================================ 
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pycountry\__init__.py:10
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pycountry\__init__.py:10: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================ short test summary info ============================================================================ 
ERROR tests/openadapt/test_scrub.py - OSError: [E050] Can't find model 'en_core_web_trf'. It doesn't seem to be a Python package or a valid path to a data directory.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
========================================================================= 3 warnings, 1 error in 6.29s ========================================================================== 
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt> 

@KrishPatel13
Copy link
Collaborator Author

@abrichr Ready for review!

@KrishPatel13 KrishPatel13 marked this pull request as ready for review August 10, 2023 18:00
@KrishPatel13
Copy link
Collaborator Author

Some Screenshots:

image

Copy link
Collaborator

@Mustaballer Mustaballer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do pip install openadapt, will this also download the spacy weights?
Also I don't think the PR title should be a feat. I think chore might be better since this is more like an internal process adjustment or a change related to the build process rather than introducing a new user-facing feature. What do you think?

@KrishPatel13 KrishPatel13 changed the title feat(*): change installation of spacy weights to runtime chore(*): change installation of spacy weights to runtime Aug 10, 2023
@KrishPatel13
Copy link
Collaborator Author

KrishPatel13 commented Aug 10, 2023

Now spacy weight will be downloaded in runtime. So whenever the user had to use the scrub code either via record or pytest or visualization, then it will check for the installation for the spacy weight and if not found it will install and then it will continue. This will be pure runtime and no need of adding this to installtion scripts or manual setup or toml or poetry ;-)

Copy link
Collaborator

@Mustaballer Mustaballer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@abrichr
Copy link
Member

abrichr commented Aug 10, 2023

Thanks @KrishPatel13 !

What do you think about the approach mentioned here explosion/spaCy#4592 (comment) :

import spacy

spacy_model_name = 'de_core_news_sm'
if not spacy.util.is_package(spacy_model_name):
    spacy.cli.download(spacy_model_name)

Also, what do you think about skipping tests which depend on spacy if the model is not installed with @pytest.mark.skipif ?

@KrishPatel13
Copy link
Collaborator Author

Good point, I will include that.

@KrishPatel13
Copy link
Collaborator Author

We will ahve to skip all of the test functions in test_scrub if the required spacy is not installed. Hence,

A better approach: https://docs.pytest.org/en/7.1.x/how-to/skipping.html#:~:text=Skip%20all%20test%20functions%20of%20a%20class%20or%20module&text=If%20you%20want%20to%20skip,skipif(...)

image

Krish Patel added 2 commits August 10, 2023 16:04
@KrishPatel13
Copy link
Collaborator Author

KrishPatel13 commented Aug 10, 2023

How pytest will be working if spacy model is not found

(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt> pip uninstall en_core_web_trf
WARNING: Skipping en_core_web_trf as it is not installed.
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt> pytest
================================ test session starts ================================
platform win32 -- Python 3.10.11, pytest-7.1.3, pluggy-1.2.0
rootdir: P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt
plugins: anyio-3.7.1, Faker-19.2.0, cov-2.10.0
collected 25 items

tests\openadapt\test_crop.py .                                                 [  4%]
tests\openadapt\test_events.py .......                                         [ 32%]
tests\openadapt\test_scrub.py sssssssssssssss                                  [ 92%]
tests\openadapt\test_summary.py ..                                             [100%]

================================= warnings summary ==================================
C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pycountry\__init__.py:10
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pycountry\__init__.py:10: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html      
    import pkg_resources

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pkg_resources\__init__.py:2871
  C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\pkg_resources\__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================== 10 passed, 15 skipped, 3 warnings in 8.55s ===================== 
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt> pip show en_core_web_trf
WARNING: Package(s) not found: en_core_web_trf
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt> imestamp=

@KrishPatel13
Copy link
Collaborator Author

@abrichr Ready for merging ;-)

addressed your comment

from openadapt import config

if not spacy.util.is_package(config.SPACY_MODEL_NAME):
pytestmark = pytest.mark.skip(reason="SpaCy model not installed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not appear to be used anywhere, is that intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used when running pytest.

Basically, when the user does not have the Spacy Model installed then all of the tests in the file will be skipped.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the model is installed then, it will import scrub and continue to run the tests

@abrichr abrichr merged commit 479937e into OpenAdaptAI:main Aug 17, 2023
@abrichr abrichr deleted the feat/install_spacy_runtime branch August 17, 2023 20:53
R-ohit-B-isht pushed a commit to R-ohit-B-isht/OpenAdapt that referenced this pull request Jun 21, 2024
…I#462)

* remove spacy from
 manual steup instruciton in README.md

* remove spacy
 installion from both the install scripts

* add todo

* test runtime code for spacy installtion

* pyetst passes even if spacy model is not installed

* addressing:

OpenAdaptAI#462 (comment)

* add spacy-trnasformers

address comment:
OpenAdaptAI#462 (comment)

* skip all the tests in test_scrub if
 spacy miodel is ont installed

* format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants