-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling TFLite delegates: NNAPI, OpenGL, CoreML, Hexagon #2270
Comments
Tentative patch:
|
Results so far, on Google Pixel 2. For NNAPI delegate:
For GPU OpenGL ES delegate:
So as expected, custom ops have no gpu delegate implementation, but |
Google Pixel 2 GPU delegation benchmark:
There are some model changes to get rid of the unsupported
Some for removing
And some shape changes to avoid shader compilation, as well as moving the
|
We still have that Op in the middle of the computation graph. As @reuben analyzed, this is coming from the LSTMCell. Likely if we figure out a way around, we could have all Ops on the GPU. |
|
@reuben Once we can get r2.2 this will get even more interesting since we could enable CoreML and Hexagon delegation. No idea of a potential speedup obviously, but I'm wondering how much of that should be exposed to the API ? |
Was the CoreML delegate ever enabled? If so, are there benchmarks I can compare against? |
Unfortunately, no, we have no benchmark: as documented in the releases, we have setup the infra in the code to enable the use of delegates, but:
You should try and hack with DeepSpeech/native_client/tflitemodelstate.cc Lines 102 to 156 in cc038c1
|
Also, as you can see on https://github.com/mozilla/tensorflow/tree/r2.3/tensorflow/lite/delegates there's Hexagon delegate, but I can't find a CoreML one anyway. |
@zaptrem According to https://www.tensorflow.org/lite/performance/coreml_delegate it is now available as experimental from r2.4, but upgrading to that version still requires some work: #3482 |
So this whole time I’ve been running inference only on iPhone 11’s performance CPU cores? It’s already at like 4X real-time (impressive). I’d love to start looking into this after we fix the iOS crashing. In a perfect world (for my specific use case) I’d target 18X, which should be possible based on Apple’s claims of “15X faster ML performance.” Idk which instructions you guys use and whether they’re compatible with the Neural Engine, though. |
It's possible, I already get faster than realtime on Android on a QM215 chip :)
We really mostly depend on TensorFlow Lite, at that level. |
Whoops, the word I was looking for was ops, not instructions. Were the custom ops and SPLIT removed as implied in the earlier comments on this issue? Or is that one of the items that wasn't completed in time? |
Nah, I was hacking YOLO, like "ok, let's removing the offending ops not caring about the output: is it enough for runtime? what about perfs?". I have not had a look at the current status, maybe delegations has more ops now? At first when we tested TFLite it was the same, and over time it's now good, so we can only hope. |
More ops might have been added, but according to the docs you linked custom
ops are still a no go.
…On Fri, Feb 12, 2021 at 4:48 AM lissyx ***@***.***> wrote:
Whoops, the word I was looking for was ops, not instructions. Were the
custom ops and SPLIT removed as implied in the earlier comments on this
issue? Or is that one of the items that wasn't completed in time?
Nah, I was hacking YOLO, like "ok, let's removing the offending ops not
caring about the output: is it enough for runtime? what about perfs?". I
have not had a look at the current status, maybe delegations has more ops
now?
At first when we tested TFLite it was the same, and over time it's now
good, so we can only hope.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMJTRU7VN5ZMMPY6FMMMU3S6T2ODANCNFSM4IGMGE3Q>
.
|
No description provided.
The text was updated successfully, but these errors were encountered: