Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds py_require() function #1706

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft

Adds py_require() function #1706

wants to merge 19 commits into from

Conversation

edgararuiz
Copy link
Collaborator

@edgararuiz edgararuiz commented Dec 30, 2024

As discussed, this will be a PR focused on making sure py_require() "looks and acts" as expected. It does not include any uv integration work, it simply works out the logic for selecting and updating the Python packages and version that uv will prepare for the end-user.

Here is a sequence of different scenarios simulating a single user initialization, which culminates in using the resulting lists to confirm that uv run is able to use them to return Python environment. I added in-line comments to each step to clarify the intention of the command call, and the thought process behind the why py_require() acted as it did:

pak::pak("edgararuiz/reticulate")
#> ℹ Loading metadata database
#> ✔ Loading metadata database ... done
#> 
#> 
#> ℹ No downloads are needed
#> ✔ 1 pkg + 11 deps: kept 12 [3.4s]

library(reticulate)

# Returns a simple message confirming the addition
py_require("pandas==2.2.3")
#> Added "pandas==2.2.3"

# Adding another package of the same name works with no errors
py_require("pandas>=2.2.1")
#> Added "pandas>=2.2.1"

# If no arguments are passed, it returns the new printout
# Separates the Setup from the History listing
# Python version shows a place holder until a version is set
# History entries have their source appended to the end as a comment
#  so that the user can easily copy paste if desired. I'm anticipating
#  that easily copy-pasting will become a common request, so this takes
#  care of it
py_require()
#> Setup ------------------------------
#>  Packages: pandas==2.2.3, pandas>=2.2.1 
#>  Python:   [No version of Python selected yet] 
#> History ----------------------------
#>  py_require(packages = "pandas==2.2.3") # R session
#>  py_require(packages = "pandas>=2.2.1") # R session

# Specifying a version of the package to omit will search of an exact match
py_require("pandas>=2.2.1", action = "omit")
#> Ommiting "pandas>=2.2.1"

# Running the same command again will fail because there are no exact matches.
# Even though this would not have impacted the list of packages, it would have
# muddled the history. This way it keeps the history clean, and the user aware
py_require("pandas>=2.2.1", action = "omit")
#> Error: 
#> Could not match "pandas>=2.2.1" 
#> Tip: Remove from your command, and try again

# Passing the name of the package, without version, will match anything
# already on the list at the 'name' level.
py_require("pandas", action = "omit")
#> Ommiting "pandas==2.2.3"

# Output of multiple packages is cleanly enumerated in a comma delimited list
py_require(c("pandas==2.2.1", "numpy==2.2.1"))
#> Added "pandas==2.2.1", and "numpy==2.2.1"

# A replacement call with multiple packages will return a bullet list to
# clearly explain what is being replaced with what
# Also, unlike 'omit', 'replace' will always match at the name level, so
# passing a new package with version, will also modify any entry that has
# no version
py_require(c("pandas", "numpy"), action = "replace")
#> - Replaced "pandas==2.2.1" with "pandas"
#> - Replaced "numpy==2.2.1" with "numpy"

# A new version of Python will be processed through the version selection
# function you shared with me. Unlike packages, the Python version returned
# by `py_require()` will only be 1. Which is always going to be determined
# on the fly. In this case using '>=3.8' will result in selecting '3.9'
py_require(python_version = ">=3.8")
#> Setting Python version to: 3.9

# Passing a new Python requirement will be processed along with what is on
# the history, not the current Python selection
py_require(python_version = ">=3.10")
#> Setting Python version to: 3.10

# A new, fixed version is ok because it does not conflict with the previous
# versions
py_require(python_version = "3.11")
#> Setting Python version to: 3.11

# Removing (omitting) a version will again trigger a full historical re-play
# and will re-run the selection function
py_require(python_version = "3.11", action = "omit")
#> Setting Python version to: 3.10

# Passing a conflicting Python version will return an error
py_require(python_version = "3.7")
#> Error: 
#> Python versions are in conflict: >=3.8, >=3.10, and 3.7

# Passing multiple arguments, with issues, will return a bullet list
# with each of the problems. Doing this so that the user knows all of the
# problems with their call the first time they run it.
py_require("package1", python_version = "3.7", action = "omit")
#> Error: 
#> - Could not match "package1" 
#>   Tip: Remove from your command, and try again
#> - An entry for Python 3.7 was not found in the history

# Simulating TF package interactivity

tf_package <- function() {
  # Added a `silent` argument to  `py_require()`
  py_require("tensorflow==2.18.*", "3.10", silent = TRUE)
}
environment(tf_package) <- asNamespace("tensorflow")

# No output is retuned because `silent` is set to TRUE
tf_package()

# The new history entry shows 'tensorflow' as the source of the `py_require()`
# call
py_require()
#> Setup ------------------------------
#>  Packages: pandas, numpy, tensorflow==2.18.* 
#>  Python:   3.10 
#> History ----------------------------
#>  py_require(packages = "pandas==2.2.3") # R session
#>  py_require(packages = "pandas>=2.2.1") # R session
#>  py_require(packages = "pandas>=2.2.1", action = "omit") # R session
#>  py_require(packages = "pandas", action = "omit") # R session
#>  py_require(packages = c("pandas==2.2.1", "numpy==2.2.1")) # R session
#>  py_require(packages = c("pandas", "numpy"), action = "replace") # R session
#>  py_require(python_verison = ">=3.8") # R session
#>  py_require(python_verison = ">=3.10") # R session
#>  py_require(python_verison = "3.11") # R session
#>  py_require(python_verison = "3.11", action = "omit") # R session
#>  py_require(packages = "tensorflow==2.18.*", python_verison = "3.10") # tensorflow


# Making sure that final requirement set can be used to find/create an
# uv 'managed' environment

fn <- function(requirements = NULL, python = "3.10") {
  if (length(requirements)) {
    requirements <- as.vector(rbind("--with", reticulate:::maybe_shQuote(requirements)))
  }
  if (length(python)) {
    python <- c("--python", reticulate:::maybe_shQuote(python))
  }
  reticulate:::system2t(path.expand("~/.local/bin/uv"),
    c("run", python, requirements, "-"),
    stdout = TRUE,
    input = c("import sys; print(sys.prefix)")
  )
}

reqs <- py_require()

# Accessing the packages and Python version is easy
fn(reqs$packages, reqs$python_version)
#> + /Users/edgar/.local/bin/uv run --python 3.10 --with pandas --with numpy --with 'tensorflow==2.18.*' -
#> [1] "/Users/edgar/.cache/uv/archive-v0/Q9gjw2rjkeTELjC0UgHHn"

Created on 2024-12-30 with reprex v2.1.1

@edgararuiz edgararuiz marked this pull request as draft December 30, 2024 18:17
@t-kalinowski
Copy link
Member

Thanks @edgararuiz! This is coming together very nicely.

I tried it out locally and also pushed some changes adding uv_binary() and get_or_create_venv().

Some loose thoughts in no particular order:

  • py_require() is a little too verbose at the moment. In practice, I imagine most py_require() calls will be silent.

  • We probably shouldn't error on py_require("foo", action = "omit") if "foo"
    isn't in the requirements list. This will help enable writing future-proof code.

  • In general, I don't think we should be raising errors from py_require() based on the value of packages. Any errors from conflicting requirements should be signaled later, from
    get_or_create_venv(). Also, users would have no straightforward way to prevent errors due to py_require() calls from package .onLoad() hooks. py_require() should succeed silently in most situations.

  • For python_version, I don't think we have to proactively resolve the python version in R code. I just tried locally and this worked without issue:

    get_or_create_venv("numpy", python_version = ">=3.10,<=3.12")

    Maybe we default to using "3.11" if no specific version was requested, but otherwise, pass along any and all constraints as a comma separated list.

  • I'm starting to think that the action="replace" as currently designed is too confusing and unpredictable. It's not clear yet how best to make it simpler. (Some snapshots of the "unsatisfiable requirements" error messages users might see would be helpful).

    Some random thoughts that came to mind:

    • Maybe we only have two action options, "add" and "omit"
      • Maybe omit just removes identical matches? (via setdiff())?
      • Maybe omit accepts globs, with something like
      .globals$reqs <- grep(glob2rx(pattern), .globals$reqs, invert = TRUE)
      
    • Maybe instead of attempting to parse and match packages names, "replace" is a global
      replace, replacing ALL the current requirements with the new requirements.
  • We should start with a default requirement of numpy.

  • What happens if py_require() is called after Python has initialized?
    I think we should attempt an install into a uv-managed cached live venv. This seems to work:

use_python(get_or_create_venv(c("numpy<2", "jax")))
py_config()
import("pandas") # error
system2(uv_binary(), c("pip install pandas", "--python", maybe_shQuote(py_exe())))
import("pandas") # success

But we'll want to think about how to signal a helpful error here, since both of these succeed.

system2(uv_binary(), c("pip install numpy>2", "--python", maybe_shQuote(py_exe())))
system2(uv_binary(), c("pip install numpy>2 --reinstall", "--python", maybe_shQuote(py_exe())))
  • The print method for requirements history should indicate if py_require() was called from a package. We might want to indicate it with something like:
py_require()
#> Declared Python Requirements ------------------------------
#>  Packages: pandas, numpy, tensorflow==2.18.* 
#>  Python:   >=3.10,<=3.12 
#> Requests History ----------------------------
#>  py_require("numpy") # from package:reticulate
#>  py_require("tensorflow==2.18.*") # from package:tensorflow
#>  py_require("tensorflow*", action = "omit") # from package:keras3
#>  py_require(c("jax", "jax-metal", "keras")) # from package:keras3
#>  py_require(c("pandas", "pillow")) # R_Globalenv
  • Any thoughts about how to expose the exclude-newer feature?

@edgararuiz
Copy link
Collaborator Author

Here is the updated functionality:

  • Accepts multiple versions of Python and passes to uv. It will prepend == if multiple constraints are passed, and a number is passed without it
  • All messages are turned off by default
  • Errors are turned off
  • Instead of History, we are not printing anything that was requested outside of the user's session, renames to "Non-user requirements"
  • Primes the requirements with numpy, and notes that in print out
library(reticulate)
py_require(python_version = "3.10")
py_require(c("pandas", "numpy"))
tf_package <- function() {
  py_require(
    packages = c("tensorflow==2.18.*", "keras"),
    python_version =  c(">=3.09", "<=3.11"), 
    silent = TRUE
    )
}
environment(tf_package) <- asNamespace("tensorflow")
tf_package()
py_require()
#> ------------------ Python requirements ----------------------
#> Current requirements ----------------------------------------
#>  Packages: numpy, pandas, tensorflow==2.18.*, keras 
#>  Python:   >=3.09, <=3.11, 3.10 
#> Non-user requirements ---------------------------------------
#>              requestor                    packages           python
#> 1 package:reticulate |                     numpy |                |
#> 2 package:tensorflow | tensorflow==2.18.*, keras | >=3.09, <=3.11 |
reticulate:::get_or_create_venv(
  requirements = reticulate:::get_python_reqs("packages"),
  python_version = reticulate:::get_python_reqs("python_versions")
  )
#> + /Users/edgar/.local/bin/uv run --color=always --no-project --python '>=3.09,<=3.11,==3.10' --with numpy --with pandas --with 'tensorflow==2.18.*' --with keras -
#> [1] "/Users/edgar/.cache/uv/archive-v0/1p-tXCQBAIBoD4IBM-AI8/bin/python3"

Created on 2025-01-07 with reprex v2.1.1

I'll start working on the py_install(), and the uv management stuff once we nail down this part

R/py_require.R Outdated Show resolved Hide resolved
R/py_require.R Outdated Show resolved Hide resolved
@t-kalinowski
Copy link
Member

This looks great! I really like the new print method!

@edgararuiz
Copy link
Collaborator Author

A new action was added: "set", which only works with exclude_newer. Anything else, gives an error. This example shows how using tensorflow 2.18 fails if I use exclude_newer from Oct/20, and works when I push that to December

library(reticulate)
py_require(exclude_newer = "2024-10-20", action = "set")
py_require(python_version = "3.10")
py_require(c("pandas", "numpy"))
tf_package <- function() {
  py_require(
    packages = c("tensorflow==2.18.*", "keras"),
    python_version =  c(">=3.09", "<=3.11"), 
    silent = TRUE
    )
}
environment(tf_package) <- asNamespace("tensorflow")
tf_package()
py_require("numpy", action = "omit")

# Fails on older date
reticulate:::get_or_create_venv(
  requirements = reticulate:::get_python_reqs("packages"),
  python_version = reticulate:::get_python_reqs("python_versions"),
  exclude_newer = reticulate:::get_python_reqs("exclude_newer")
)
#> + /Users/edgar/.local/bin/uv run --color=always --no-project --python-preference=only-managed --exclude-newer 2024-10-20 -
#> Error in reticulate:::get_or_create_venv(requirements = reticulate:::get_python_reqs("packages"), : Python requirements could not be satisfied.
#> Call `py_require()` to omit or replace conflicting requirements.

# Succeeds on newer date
py_require(exclude_newer = "2024-12-20", action = "set")

reticulate:::get_or_create_venv(
  requirements = reticulate:::get_python_reqs("packages"),
  python_version = reticulate:::get_python_reqs("python_versions"),
  exclude_newer = reticulate:::get_python_reqs("exclude_newer")
)
#> + /Users/edgar/.local/bin/uv run --color=always --no-project --python-preference=only-managed --exclude-newer 2024-12-20 -
#> [1] "/Users/edgar/.cache/uv/archive-v0/l3Y2P_i7JkY-qJMQfqsOF/bin/python3"

Created on 2025-01-07 with reprex v2.1.1

@edgararuiz
Copy link
Collaborator Author

Here's an example of the print out:

py_require()
#> ------------------ Python requirements ----------------------
#> Current requirements ----------------------------------------
#>  Packages: pandas, tensorflow==2.18.*, keras 
#>  Python:   >=3.09, <=3.11, 3.10 
#>  Exclude: After 2024-10-20 
#> Non-user requirements ---------------------------------------
...

R/py_require.R Outdated Show resolved Hide resolved
tests/testthat/test-py_require.R Outdated Show resolved Hide resolved
R/py_require.R Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants