iterators? #71

ajavadia · 2020-05-02T16:25:00Z

I don't think retworkx uses iterators, the same way networkx does? This might lead to very large memory usage for large graphs (and possibly hurt time too).

For example often you want a shallow view into the immediate successors of a node, not all the nodes that may succeed it until the end. Same for layers of a dag.

Networkx implements this an an iterator. Terra also uses this as an iterator.

But I think retworkx constructs the full list, and only artificially returns an iterator in terra.

For an example of how this can be problematic for large circuits, see this PR in terra: Qiskit/qiskit#371

mtreinish · 2020-05-06T11:16:47Z

The main reason was the ease of implementation, iterators behave a bit differently between rust and python and building a rust function that returns a python iterator wasn't as straightforward as just doing yield data from inside a function and probably would require creating a custom type that implements the python iter protocol and then returning an object of that new type. At least for the first version to get something working it was easier to just return a list.

I agree that for large results implementing it with iterators could be more efficient. But at least so far I haven't seen any scaling limits or memory growth issues with it from the terra dagcircuit. In general memory consumption went down [1] [2] and all graph operations became about an order of magnitude faster in aggregate compared to the previous networkx implementation. Although this is likely just the benefit of using a compiled language and there could still be a scaling limit as things increase in size, or for examples or test cases I haven't seen.

I think we should leave this open as a lower priority future enhancement though and if the need arises (like we start hitting scaling limits) we can prioritize it.

[1] https://qiskit.github.io/qiskit/#transpiler_levels.TranspilerLevelBenchmarks.peakmem_quantum_volume_transpile_50_x_20?commits=95328adb
[2] https://qiskit.github.io/qiskit/#transpiler_levels.TranspilerLevelBenchmarks.peakmem_transpile_from_large_qasm_backend_with_prop?commits=95328adb (this is an eoh qasm example from aqua loaded from qasm: https://github.com/Qiskit/qiskit/blob/master/test/benchmarks/qasm/test_eoh_qasm.qasm )

mtreinish · 2020-05-06T11:21:07Z

To implement this though we probably should try to maintain backwards compatibility with the existing functions and add a second method for the places it makes sense to uses iterators. For example, if layers() becomes a bottleneck we could add a layers_iter() function that would return an iterator object instead of a list. That would ensure a smooth transition without potentially breaking users who expect a list returned from a function.

mtreinish · 2020-06-25T21:03:42Z

For reference in case someone would look to take a stab at this the PyO3 library just added an example on how to implement python iterators in rust: https://github.com/PyO3/pyo3/blob/master/examples/rustapi_module/src/pyclass_iter.rs

So to implement this we basically have to wrap the functions in the implementation of PyIterProtocol for a custom struct that has the necessary state to do each iteration. Then add a function which will return an iterator that will create an instance of that struct and return it. For example, with nodes it would be something like (untested so probably doesn't work):

#[pyclass]
struct LayerIter{
    dag: &PyDAG,
    first_layer:  Vec<NodeIndex>
}

#[pyproto]
impl PyIterProtocol for LayerIter {
    fn __next__(mut slf: PyRefMut<Self>) -> IterNextOutput<Vec<PyObject>, &'static str> {
            <insert implementation of each iteration here>
            IterNextOutput::Yield(layer)
            ...
            IterNextOutput::Return("All Layers Returned")
    }
}

#[pyfunction]
fn layers_iter(
    dag: &PyDAG,
    first_layer: Vec<usize>,
) -> PyResult<PyObject> {
    LayerIter{
        dag: dag
        first_layer:  first_layer.map(|x| NodeIndex::new(x)).collection,
     }.into()
}

mtreinish · 2020-09-28T13:16:14Z

I took a quick look at doing this last week for #147 and hit a lifetime issue on the borrowed graph object in the iterator struct. Which looks like the issue described in PyO3/pyo3#1085 (and more succinctly described in PyO3/pyo3#1089 ). The only path for doing iterators in the short term until those issues are resolved in PyO3 I can think of is that we generate a full set of results for a function in the rust and wrap and expose that result that as an iterator to Python. This doesn't offer the primary advantage of using iterators, deferring the execution, but it might provide some advantages by deferring the conversion of types in rust to python types. But I suspect that will be pretty minimal, we'll have to benchmark to be sure.

This commit changes the return type of the bfs_successors function to be a custom class BFSSuccessors. This new return class implements both the sequence protocol and iterator protocol. This means that aside from explicit type checking it should be backwards compatible with the list being previously returned. It can be used with either index based access or iterated over. This should be more efficient for large graphs because instead of doing the copy and type conversion and iterating over the entire nested Vec of results it instead does it per access (either via __getitem__ or __next__). It does add a small amount of overhead for smaller graphs but it (is minimal since the function returns in microseconds in such cases so a 10-20% overhead is not a big deal). It's worth noting while this defers the type conversion, it does not defer execution like most python iterators normally do. When bfs_successors is called it will still always fully traverse the graph. However, in practice the bottleneck for the bfs_successor function wasn't actually the graph traversal, but instead the type conversion. Related to Qiskit#71

This commit changes the return type of the bfs_successors function to be a custom class BFSSuccessors. This new return class implements both the sequence protocol and iterator protocol. This means that aside from explicit type checking it should be backwards compatible with the list being previously returned. It can be used with either index based access or iterated over. This should be more efficient for large graphs because instead of doing the copy and type conversion and iterating over the entire nested Vec of results it instead does it per access (either via __getitem__ or __next__). It does add a small amount of overhead for smaller graphs but it is minimal since the function returns in microseconds in such cases so a 10-20% overhead is not a big deal. It's worth noting while this defers the type conversion, it does not defer execution like most python iterators normally do. When bfs_successors is called it will still always fully traverse the graph. However, in practice the bottleneck for the bfs_successor function wasn't actually the graph traversal, but instead the type conversion. Related to Qiskit#71

mtreinish · 2020-10-28T20:22:56Z

So the first case where there was a strong need for this came up in the sabre_swap transpiler pass where it was calling bfs_successors a large number of times on a large graph. Looking at a profile on it the bottleneck was the conversion from the Vec<(PyObject, Vec<PyObject>)> to a Python list(object, list(object)) not the actual graph traversal. This meant we could use an iterator to address the performance issue there because the piece we could defer with an iterator was the actual conversion.

I did this in #185 and to use an iterator there without worrying about the current pyo3 limitations it just moves the result Vec<(object, Vec<object>)> into a custom struct that implements the python iter protocol. I also implemented the python sequence protocol so it would be backwards compatible with the list in usage (but not anywhere explicit typing was required).

This is probably a model we can use in other places when/if we encounter bottlenecks around type conversion for large outputs from retworkx functions.

* Add custom iterator class for BFS successors return This commit changes the return type of the bfs_successors function to be a custom class BFSSuccessors. This new return class implements both the sequence protocol and iterator protocol. This means that aside from explicit type checking it should be backwards compatible with the list being previously returned. It can be used with either index based access or iterated over. This should be more efficient for large graphs because instead of doing the copy and type conversion and iterating over the entire nested Vec of results it instead does it per access (either via __getitem__ or __next__). It does add a small amount of overhead for smaller graphs but it is minimal since the function returns in microseconds in such cases so a 10-20% overhead is not a big deal. It's worth noting while this defers the type conversion, it does not defer execution like most python iterators normally do. When bfs_successors is called it will still always fully traverse the graph. However, in practice the bottleneck for the bfs_successor function wasn't actually the graph traversal, but instead the type conversion. Related to #71 * Only implement sequence protocol Using the sequence protocol we can still get an implicit iterator by just casting it on the python side. This will still get us the lazy type conversion but simplify the api and also make the behavior more consistent. At the same time to ensure we're handling negative indices correctly a test method is added to verify that a negative index access to the sequence works as expected. * Update src/lib.rs Co-authored-by: Kevin Krsulich <[email protected]>

This commit adds a new return type NodeIndices that is used as the return type for functions that return Vec<usize> when its used as the return type to return a list of node indices. This custom type implements the Python sequence protocol and can be used as the list which was previously returned except for where explicit type checking was used. Just as in Qiskit#185 the advantage here is that for large lists of node indices the type conversion from a Vec<usize> to list(int) becomes a large bottleneck. This avoids that conversion and only does a usize to python int conversion on access. Related to Qiskit#71

* Add NodeIndices return type This commit adds a new return type NodeIndices that is used as the return type for functions that return Vec<usize> when its used as the return type to return a list of node indices. This custom type implements the Python sequence protocol and can be used as the list which was previously returned except for where explicit type checking was used. Just as in #185 the advantage here is that for large lists of node indices the type conversion from a Vec<usize> to list(int) becomes a large bottleneck. This avoids that conversion and only does a usize to python int conversion on access. Related to #71 * Add tests for comparison * Apply suggestions from code review Co-authored-by: Lauren Capelluto <[email protected]>

mtreinish · 2021-05-21T15:14:55Z

I'm going to close this issue for now. We've found a pattern for using custom return types that wrap an inner rust object and implement the python protocols for doing conversion to a python object dynamical on access. This seems to work well and avoids most of the overhead we were seeing building large return objects before. However, these objects do not defer execution like python iterators do, and but we're blocked from doing that by limitations currently in pyo3. I think we can re-investigate doing it when pyo3 adds support for borrowed references to python object to exist in a struct.

georgios-ts · 2021-09-16T11:12:41Z

Came by to say that this is actually possible. We need to wrap the graph in Py<> to avoid specifying a lifetime instead of a simple reference as in @mtreinish 's code #71 (comment) (see https://pyo3.rs/main/doc/pyo3/prelude/struct.Py.html).
I did this in georgios-ts@10e8011.
One limitation in general is that the Iter struct can't hold any rust struct that needs a lifetime.

But i'm against using it at all since it seems to open the door in an "unsafe" interaction between Rust and Python and it's harder to reason about the output of the functions. For example, the underlying graph can mutate while we still hold an iterator over its layers, the next iterations will peek that change and even if we don't panick, we can't at least guarantee a meaningful output (it's up to the user to not break it).

mtreinish added the enhancement New feature or request label May 6, 2020

mtreinish mentioned this issue Oct 8, 2020

Sabre swap mapper dominated by data copy Qiskit/qiskit#5197

Closed

mtreinish mentioned this issue Oct 28, 2020

Add custom iterator class for BFS successors return #185

Merged

mtreinish mentioned this issue Nov 16, 2020

Add NodeIndices return type #198

Merged

mtreinish closed this as completed May 21, 2021

georgios-ts mentioned this issue Jun 28, 2021

[WIP] Add vf2 mapping functions #368

Closed

2 tasks

jlapeyre mentioned this issue Sep 10, 2021

Add GatesInBasis analysis pass Qiskit/qiskit#6832

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iterators? #71

iterators? #71

ajavadia commented May 2, 2020

mtreinish commented May 6, 2020 •

edited

Loading

mtreinish commented May 6, 2020

mtreinish commented Jun 25, 2020

mtreinish commented Sep 28, 2020

mtreinish commented Oct 28, 2020

mtreinish commented May 21, 2021

georgios-ts commented Sep 16, 2021 •

edited

Loading

iterators? #71

iterators? #71

Comments

ajavadia commented May 2, 2020

mtreinish commented May 6, 2020 • edited Loading

mtreinish commented May 6, 2020

mtreinish commented Jun 25, 2020

mtreinish commented Sep 28, 2020

mtreinish commented Oct 28, 2020

mtreinish commented May 21, 2021

georgios-ts commented Sep 16, 2021 • edited Loading

mtreinish commented May 6, 2020 •

edited

Loading

georgios-ts commented Sep 16, 2021 •

edited

Loading