Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to specify python_binary for agent in clearml.conf #523

Open
nieag opened this issue Dec 22, 2021 · 19 comments
Open

Not able to specify python_binary for agent in clearml.conf #523

nieag opened this issue Dec 22, 2021 · 19 comments

Comments

@nieag
Copy link

nieag commented Dec 22, 2021

Hi!
Been trying to setup an clearml-agent on a linux server. Running into issues when the agent picks a task from a queue and attempts to create a venv on the server. On the server there's multiple python versions available using update-alternatives to manage the various sym-links. As per default the agent picks a python 2.7 version on /usr/bin/python2, maybe due to /usr/bin/python pointing at /etc/alternatives/python?

I tried specifying a different version in the clearml.conf file using python_binary: "/usr/bin/python3.8" but this doesn't seem to have any effect.

Any help would be appreciated!

@jkhenning
Copy link
Member

Hi @nieag ,

What is the clearml-agent version you're using? Can you verify this also happens with the latest 1.2.0rc0?

@nieag
Copy link
Author

nieag commented Dec 23, 2021

Running
clearml-agent==1.1.1 on the client, and clearml==1.1.4 on the server. Will try updating to latest!

@nieag
Copy link
Author

nieag commented Dec 28, 2021

Upgrading to clearml-agent==1.2.0rc0 doesn't seem to help, the agent config is set as

agent.python_binary = /usr/bin/python3.8

However, the executables that are being used are
New python executable in .clearml/venvs-builds/3.8/bin/python2
Also creating executable in .clearml/venvs-builds/3.8/bin/python

Manually starting either of these I can verify that both are python version 2.7.17

And the following error trace is returned
image

@jkhenning
Copy link
Member

Hi @nieag ,

Sorry to taking so long to answer 🙁

Are you running the CleaRML Agent using docker mode (i.e. --docker) or standard venv mode?

@nieag
Copy link
Author

nieag commented Jan 2, 2022

Hi, no worries. I'm running in standard venv mode, without the docker flag.

@nieag
Copy link
Author

nieag commented Jan 3, 2022

I'm also running the clearml-agent from a venv I created manually, don't know if that might affect things? Wasn't able to set the agents' python-binary to the one existing in the venv, nor did it seem to pick the one that I called the

"clearml-agent daemon --queue default --foreground" command with.

Should be the case according to lines 109-113, file session.py, in the clearml-agent repo I think,

# HACK make sure we have python version to execute,
# if nothing was specific, use the one that runs us
def_python = ConfigValue(self.config, "agent.default_python")
if not def_python.get():
    def_python.set("{version.major}.{version.minor}".format(version=sys.version_info))

@jkhenning
Copy link
Member

@nieag that's strange, it looks to me like there some kind of configuration mixup - any chance VIRTUAL_ENV env var is defined?

@nieag
Copy link
Author

nieag commented Jan 4, 2022

Hey,
Yeah, the VIRTUAL_ENV is set to

VIRTUAL_ENV=/home/company/nieage/clearml_venv

Which is the venv where I installed the clearml-agent and am currently running from. It doesn't use that path for the python-binary however.

@jkhenning
Copy link
Member

Did you set it up manually?

@nieag
Copy link
Author

nieag commented Jan 4, 2022

No, just created the venv normally with python, activated it and installed the clearml-agent. I assume that variable is set by the venv activation script.

@jkhenning
Copy link
Member

Can you try running the agent without the activation script using <path-to-venv-python-binary> -m clearml_agent.__main__ <args>?

@nieag
Copy link
Author

nieag commented Jan 4, 2022

Tried doing that, gives me the same error as before. Added the log file below:

task f719029e32d7466b95d1dabea5e68525 pulled from c06821726cb9430caa298aeb2793ff05 by worker SE-ML03-SRV:0
Running task 'f719029e32d7466b95d1dabea5e68525'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.qdx6gqpq.txt', '/tmp/.clearml_agent_out.qdx6gqpq.txt'
Current configuration (clearml_agent v1.2.0rc0, location: /tmp/.clearml_agent.alxp3z4o.cfg):
----------------------
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key = 
sdk.aws.s3.region = 
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = 
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = 
api.web_server = 
api.files_server = 
api.credentials.access_key = 
api.host = 
agent.worker_id = 
agent.worker_name = 
agent.force_git_ssh_protocol = false
agent.python_binary = 
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/nieage/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/nieage/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/nieage/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/nieage/.clearml/pip-cache
agent.docker_apt_cache = /home/nieage/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = /root/.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = /root/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.git_user = 
agent.ignore_requested_python_version = true
agent.default_python = 3.8
agent.cuda_version = 112
agent.cudnn_version = 76

Executing task id [f719029e32d7466b95d1dabea5e68525]:
repository = 
branch = 
version_num = 
tag = 
docker_cmd = 
entry_point = keras_test.py
working_dir = .

New python executable in /home/nieage/.clearml/venvs-builds/3.8/bin/python2
Also creating executable in /home/nieage/.clearml/venvs-builds/3.8/bin/python
Installing setuptools, pkg_resources, pip, wheel...
  Complete output from command /home....ilds/3.8/bin/python2 - setuptools pkg_resources pip wheel:
  Collecting setuptools
Exception:
Traceback (most recent call last):
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/commands/install.py", line 353, in run
    wb.build(autobuilding=True)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/wheel.py", line 749, in build
    self.requirement_set.prepare_files(self.finder)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/req/req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/req/req_set.py", line 554, in _prepare_file
    require_hashes
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/req/req_install.py", line 278, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/index.py", line 465, in find_requirement
    all_candidates = self.find_all_candidates(req.name)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/index.py", line 423, in find_all_candidates
    for page in self._get_pages(url_locations, project_name):
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/index.py", line 568, in _get_pages
    page = self._get_page(location)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/index.py", line 683, in _get_page
    return HTMLPage.get_page(link, session=self.session)
  File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/index.py", line 795, in get_page
    resp.raise_for_status()
  File "/home/nieage/.clearml/venvs-builds/3.8/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
HTTPError: 404 Client Error: Not Found for url: .../_packaging/ml-imaging-feed/pypi/simple/setuptools/
----------------------------------------
...Installing setuptools, pkg_resources, pip, wheel...done.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 2375, in <module>
    main()
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 724, in main
    symlink=options.symlink)
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 992, in create_environment
    download=download,
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 922, in install_wheel
    call_subprocess(cmd, show_stdout=False, extra_env=env, stdin=SCRIPT)
  File "/usr/lib/python3/dist-packages/virtualenv.py", line 817, in call_subprocess
    % (cmd_desc, proc.returncode))
OSError: Command /home....ilds/3.8/bin/python2 - setuptools pkg_resources pip wheel failed with error code 2
Running virtualenv with interpreter /usr/bin/python2

clearml_agent: ERROR: Command '['python3.8', '-m', 'virtualenv', '/home/nieage/.clearml/venvs-builds/3.8']' returned non-zero exit status 1.


Leaving process id 13803
DONE: Running task 'f719029e32d7466b95d1dabea5e68525', exit status 1


@jkhenning
Copy link
Member

Well, the agent will always use one of the system-installed python binaries to execute the task (i.e. to create the venv for the task) and never the venv binary it was executed from. However, I'm not sure why you see this: New python executable in /home/nieage/.clearml/venvs-builds/3.8/bin/python2

@jkhenning
Copy link
Member

jkhenning commented Jan 4, 2022

Perhaps you can try to clean up the /home/nieage/.clearml/venvs-builds dir (or move it temporarily) to see if it helps?

@nieag
Copy link
Author

nieag commented Jan 4, 2022

No that doesn't seem to affect it. However, I tried running "python3.8 -m virtualenv " this gives me the same python2 error, so this is most likely not due to clearml. Seemed to maybe be caused by how virtualenv is installed which can affect the python version it defaults to. Will try and investigate a bit on my end, and update this thread if I find something!

Thanks for the help so far!

@ColdTeapot273K
Copy link

I'm having a similar issue with pyenv + poetry setup (& current release versions of clearml/clearml-agent). Clearml-agent is trying to use the wrong python consistently.

  • Python I don't wanna use - system python (/usr/bin/), 3.7.10 (maximum available for AWS repos).
  • Python I wanna use - my pyenv active python (activated via pyenv global 3.9.10). It is active in a sense that, from a fresh shell, from anywhere in the filesystem:
$ which python
#~/.pyenv/shims/python
$ python --version                       
#Python 3.9.10
  • I've set in clearml.conf
python_binary: "/home/acalabourdin/.pyenv/shims/python"

^a valid path, returns the python 3.9.10

package_manager: {
    # supported options: pip, conda, poetry
    # type: pip,
    type: poetry,
  • poetry.lock has:
python = "~3.9"

Upon running

clearml-agent daemon --queue default --foreground  

from system (i.e. not from any python virtual env.) I get the following:

Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
The currently activated Python version 3.7.10 is not supported by the project (~3.9).
Trying to find and use a compatible version. 
Using python3 (3.9.10)

...

Installing dependencies from lock file

  SolverProblemError

  The current project's Python requirement (3.7.10) is not compatible with some of the required packages Python requirement:
    - numpy requires Python >=3.8, so it will not be satisfied for Python 3.7.10
  
  Because numpy (1.22.1) requires Python >=3.8
   and no versions of numpy match >=1.20.0,<1.22.1 || >1.22.1, numpy is forbidden.
  So, because no versions of numpy match <1.20.0
   and ds-614 depends on numpy (*), version solving failed.

  at ~/.local/share/pypoetry/venv/lib64/python3.7/site-packages/poetry/puzzle/solver.py:241 in _solve
      237│             packages = result.packages
      238│         except OverrideNeeded as e:
      239│             return self.solve_in_compatibility_mode(e.overrides, use_latest=use_latest)
      240│         except SolveFailure as e:
    → 241│             raise SolverProblemError(e)
      242│ 
      243│         results = dict(
      244│             depth_first_search(
      245│                 PackageNode(self._package, packages), aggregate_package_nodes

The Installing dependencies from lock file ... error part is poetry-specific and I have some fix in mind (deserves a dedicated issue), but the error above that, to me, points to the fact that clearml ignores explicitly configured path to python binary.

@bmartinn
Copy link
Member

Hi @ColdTeapot273K

... clearml ignores explicitly configured path to python binary.

You should probably also set ignore_requested_python_version: true

Can you verify the issue still exists in the latest RC (I think there was a fix to a similar issue)

pip install clearml==1.2.0rc1

@ColdTeapot273K
Copy link

ColdTeapot273K commented Nov 18, 2022

HI @bmartinn

I should report that the problem still persists, with clearml==1.8.0, poetry==1.2.2, pyenv==2.3.5. Even with ignore_requested_python_version: true

ClearML agent keeps clinging to the wrong python and it's really blocking my work on ClearML stack

UPD:
From what is see, python_binary setting is supposed to default to the python version used to launch the agent
https://github.com/allegroai/clearml-agent/blob/9eee213683252cd0bd19aae3f9b2c65939d75ac3/clearml_agent/backend_api/config/default/agent.conf#L34

and this definitely doesn't happen

I even use super explicit commands like:
pyenv exec clearml-agent daemon --queue services-py37
CLEARML_AGENT_EXTRA_PYTHON_PATH='/home/docker-user/.pyenv/shims/python' clearml-agent daemon --queue services-py37

I need at least a workaround

@jkhenning
Copy link
Member

Hi @ColdTeapot273K, I think what @bmartinn meant was you need the latest clearml-agent version... (try v1.5.0rc0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants