Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on exporting large nested Rust structs to Python #1701

Closed
cswinter opened this issue Jun 27, 2021 · 8 comments
Closed

Guidance on exporting large nested Rust structs to Python #1701

cswinter opened this issue Jun 27, 2021 · 8 comments

Comments

@cswinter
Copy link

I am creating a Python API for a Rust library. Some of the methods return nested Rust structs.

For example:

#[pymethods]
impl Client {
  fn list(&self) -> PyResult<Vec<XpStatus>> {
    // ...
  }
}

#[pyclass]
pub struct XpStatus {
    pub xp: Xp,
    pub container_status: HashMap<ContainerId, ContainerStatus>,
    pub containers_by_lifecycle: HashMap<ContainerStatusKind, Vec<ContainerId>>,
    pub max_containers: u64,
}

#[pyclass]
pub struct Xp {
    pub def: XpDef,
    pub uid: XpId,
    pub lifecycle: XpLifecycle,
    pub creation_time: DateTime<Utc>,
    pub priority: i64,
    pub queue_pos: u64,
}

pub enum ContainerStatus {
    Running,
    Creating,
    Completed {
        exit_code: u64,
        error: String,
        finished_at: DateTime<Utc>,
    },
    None,
}

// ...

I would like the Python code to be able to access all members of the returned structs.

The simplest option might be to define getters on all members. However, unless I'm mistaken this would seem to require copying the entire substructure on each access which would make it very expensive to iterate over collections contained in the struct from Python code.

To avoid this performance hit, we need to fully convert the struct into a Python compatible object. I can think of two different ways this could be achieved.

  1. Create a second version for each struct in Rust which is compatible in Python and then manually convert these from Rust code.
    This would look something like this:
#[pyclass(name="XpStatus")]
pub struct PyXpStatus {
  #[getter]
  pub xp: PyCell<PyXp>,
  #[getter]
  pub container_status: PyDict,
  // ...
}

impl From<XpStatus> for PyXpStatus {
  // ...
}

Probably this could even be generated automatically by a macro.

  1. Create Python versions of all structs in Python, and instantiate those directly.
    If we're going to create a new version of all structs anyway, we might as well do so in Python. This has the added benefits of allowing for a slightly more idiomatic API and also making "jump to source" work so Python users can look at the Python definitions of all classes rather than an opaque stub or Rust source.
    I think this might be the preferred solution. I'm still slightly unsure how to best convert the Rust structs into Python classes on the Rust side. When building a mixed Rust/Python project with maturin, can you just use PyModule::import to import the Python portion of the module on the Rust side? Or would you use PyModule::from_code and include_str?

Does this seem like a reasonable approach?

@davidhewitt
Copy link
Member

Option 1 is probably what I would pick for now.

Note that in the future I would like it to be possible for #[pyo3(get)] to avoid cloning the underlying data - as per #1358 (comment).

This is in reality still some way off.

@mejrs
Copy link
Member

mejrs commented Jun 27, 2021

We should probably add a section in the guide that discusses this stuff in depth.

To avoid this performance hit

You're going to have to take that hit somewhere on the Rust/Python boundary. You could allocate everything on the Python heap (so, you'd have Py<...> wrappers everywhere) which avoids the cloning but this just moves some cost to conversions when Rust code needs to work on the structs.

The simplest option might be to define getters on all members. However, unless I'm mistaken this would seem to require copying the entire substructure on each access which would make it very expensive to iterate over collections contained in the struct from Python code.

Also, it would return a fresh clone on every access, so Python code wouldn't be able to mutate the collection. See https://pyo3.rs/main/faq.html#pyo3get-clones-my-field

Here's a third approach:

#[pyclass]
struct Foo{
	bar: Py<Bar>
}

#[pyclass]
struct Bar{
	inner: HashMap<Py<ContainerId>, Py<ContainerStatus>>
}

#[pyproto]
impl PyMappingProtocol for Bar {
	/* todo */
}

#[pyproto]
impl PyIterProtocol for Bar {
	/* todo */
}

#[pyclass]
struct BarIter{
	inner: Py<Bar>,
	state: /* todo */
}

#[pyproto]
impl PyIterProtocol for BarIter {
	/* todo */
}

What is best will depend on what exactly you are doing (and benchmarks, probably). YMMV.

@cswinter
Copy link
Author

Related question, how do you create something like a Py<PyDateTime> or Py<PyList> in Rust? All creation methods I can find seem to return borrows rather than owned values.

@mejrs
Copy link
Member

mejrs commented Jun 29, 2021

You can use .into() to do that for the native Python types:

use pyo3::prelude::*;
use pyo3::types::PyDict;

struct Bar {
    inner: Py<PyDict>,
}

impl Bar {
    fn new() -> Bar {
        Python::with_gil(|py| {
            let dict: Py<PyDict> = PyDict::new(py).into();
            Bar {
                inner: dict,
            }
        })
    }
}

You can use Py::new() if you want to store a pyclass.

@cswinter
Copy link
Author

I tried option 1 and it's a fine solution, but ended up going with option 2 instead where I just define the classes in Python and import them and convert on the Rust side. I'm quite happy with this, it's roughly the same amount of effort/boilerplate to creating Python-compatible structs in Rust but I also get MyPy type annotations and all the @dataclass goodies.

@Congyuwang
Copy link

Just a question. Performance wise, how does doing this: "Create a second version for each struct in Rust which is compatible in Python and then manually convert these from Rust code." compared to using a convenient crate called pythonize, which uses serde

@davidhewitt
Copy link
Member

My gut feeling is that you can get better performance from doing hand-written code over using pythonize, however I would suggest benchmarking! Note that I haven't put much effort into optimizing pythonize, although PRs to speed it up would be welcome if you find they benefit you.

@Congyuwang
Copy link

My own experimentation seems to show little difference between using pythonize and hand-coding to_python method. Didn’t do any crazy optimisation though. I guess most of the time is spent on instantiating and filling up PyList and PyDict and such. pythonize does not seem to have so much overhead.

@PyO3 PyO3 locked and limited conversation to collaborators Oct 31, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

4 participants