-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An idea - a Runner
that owns the input and output arrays and freezes shapes and names
#40
Comments
In fact, I think that the current So to retain current API you could have impl Session {
pub fn run(&mut self, inputs: Inputs) -> Result<Outputs> {
let runner = Runner::new(self, inputs)?;
runner.execute()?;
Ok(runner.into_outputs())
}
} (As noted in #39 though, things like caching names should be probably done outside of all of this anyway upon model loading; also precaching shapes when there's no dynamic axes etc). Note: the above will probably not compile because of potential multiple mutable borrows etc, but that's technical details - can be made to work with a bit of munging and shuffling, I just tried to make the general idea clear. |
In fact, to think about it further, I think, in most practical cases where the speed of execution would be critical (realtime apps), input shape would almost always be frozen and known in advance, so all dimensions would be fully known and the only thing that would change would be the inputs themselves (e.g. receiving frames from a camera, etc). |
I have a prototype that I can try to push tonight - so, in brief, it reduces execution time of a tiny graph with a few nodes from 15us to 8us, so almost 2x speedup, plus there's no more extractors, no allocations, no copies or clones (as suggested above). |
This seems like a good fit for my use case: a service that loads precisely one .onnx file, and then feeds data from each request through the resulting session. |
@marshallpierce See #41 for a preliminary working implementation |
Bottom line - there's tons of overhead in
run()
currently:Vec
instances and a bunch of strings, there's tons of allocations all over the place (so for small inputs and graphs this is noticeable)Here's one idea, what if you could do something like this (I think this way you could bring the overhead down to almost zero).
// maybe I've missed something, would like to hear your thoughts, @nbigaouette :)
The text was updated successfully, but these errors were encountered: