Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Couldn't access TFLite API in TensorFlow package when using export module #2220

Closed
wasertech opened this issue May 23, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@wasertech
Copy link
Collaborator

Description
When you try to produce output_graph.tflite using python -m coqui_stt_training.export, you are stopped by an error stating the following.

E Couldn't access TFLite API in TensorFlow package. The NVIDIA TF1 docker image removes the TFLite API, so you'll need to save the checkpoint outside of Docker and then export it using the training package directly:
E pip install coqui_stt_training
E python -m coqui_stt_training.export --checkpoint_dir ... --export_dir ...
E This should work without needing any special CUDA setup, even for CUDA checkpoints.

To Reproduce
Steps to reproduce the behavior:
Once you finished training (in my case after best_dev-0), you can try to export your checkpoint into output_graph.tflite.

python -m coqui_stt_training.export --alphabet_config_path /mnt/models/alphabet.txt --scorer_path /mnt/lm/kenlm.scorer --feature_cache /mnt/sources/feature_cache --n_hidden 2048 --beam_width 500 --lm_alpha 1.6383103526118539 --lm_beta 0.1291025653672666 --load_evaluate best --checkpoint_dir /mnt/checkpoints/ --export_dir /mnt/models/ --export_tflite true --export_author_id CommonVoice-FR-Team --export_model_version 0.9 --export_contact_info https://discourse.mozilla.org/c/voice/fr --export_license MIT-0 --export_language fr-FR --export_min_stt_version 1.0.0 --export_max_stt_version 1.4.0 --export_model_name cv-fr-tflite
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
I Exporting the model...
I Loading best validating checkpoint from /mnt/checkpoints/best_dev-0
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
E Couldn't access TFLite API in TensorFlow package. The NVIDIA TF1 docker image removes the TFLite API, so you'll need to save the checkpoint outside of Docker and then export it using the training package directly: 
E     pip install coqui_stt_training
E     python -m coqui_stt_training.export --checkpoint_dir ... --export_dir ...
E This should work without needing any special CUDA setup, even for CUDA checkpoints.

See my full logs.

Expected behavior
The model should be exported and saved to a .tflite file.

Environment:

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Docker version 20.10.15, build fd82621d35
  • TensorFlow installed from (our builds, or upstream TensorFlow): nvcr.io/nvidia/tensorflow:22.02-tf1-py3
  • TensorFlow version (use command below): 1.15.5
  • Python version: Python 3.8.10
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: CUDA 11.6 / Driver 510.68.02
  • GPU model and memory: 2x RTX Titan 24Gb (7.5 cuda score)
  • Exact command to reproduce: python -m coqui_stt_training.export --alphabet_config_path /mnt/models/alphabet.txt --scorer_path /mnt/lm/kenlm.scorer --feature_cache /mnt/sources/feature_cache --n_hidden 2048 --beam_width 500 --lm_alpha 1.6383103526118539 --lm_beta 0.1291025653672666 --load_evaluate best --checkpoint_dir /mnt/checkpoints/ --export_dir /mnt/models/ --export_tflite true --export_author_id CommonVoice-FR-Team --export_model_version 0.9 --export_contact_info https://discourse.mozilla.org/c/voice/fr --export_license MIT-0 --export_language fr-FR --export_min_stt_version 1.0.0 --export_max_stt_version 1.4.0 --export_model_name cv-fr-tflite

Additional context
I'm building a custom docker image from this branch of commonvoice-fr, using this branch of STT but it shouldn't matter. I'll try on coqui's image to check.
I suspect it's related to the fact we updated to nvidia/tensorflow:22.02-tf1-py3's image.

@wasertech wasertech added the bug Something isn't working label May 23, 2022
@wasertech
Copy link
Collaborator Author

Yes this is also an issue with main:

Successfully built 9b900d7d95b4
Successfully tagged stt-train:latest

================
== TensorFlow ==
================

NVIDIA Release 22.02-tf1 (build 32060646)
TensorFlow Version 1.15.5

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright 2017-2022 The TensorFlow Authors.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

root@2af4d6ae0db3:/code# python -m coqui_stt_training.export --alphabet_config_path /mnt/models/alphabet.txt --scorer_path /mnt/lm/kenlm.scorer --feature_cache /mnt/sources/feature_cache --n_hidden 2048 --beam_width 500 --lm_alpha 1.6383103526118539 --lm_beta 0.1291025653672666 --load_evaluate best --checkpoint_dir /mnt/checkpoints/ --export_dir /mnt/models/ --export_tflite true --export_author_id CommonVoice-FR-Team --export_model_version 0.9 --export_contact_info https://discourse.mozilla.org/c/voice/fr --export_license MIT-0 --export_language fr-FR --export_min_stt_version 1.0.0 --export_max_stt_version 1.4.0 --export_model_name cv-fr-tflite
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
I Exporting the model...
I Loading best validating checkpoint from /mnt/checkpoints/best_dev-0
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
E Couldn't access TFLite API in TensorFlow package. The NVIDIA TF1 docker image removes the TFLite API, so you'll need to save the checkpoint outside of Docker and then export it using the training package directly: 
E     pip install coqui_stt_training
E     python -m coqui_stt_training.export --checkpoint_dir ... --export_dir ...
E This should work without needing any special CUDA setup, even for CUDA checkpoints.
root@2af4d6ae0db3:/code# 

@wasertech
Copy link
Collaborator Author

wasertech commented May 23, 2022

Managed to solve it by using another venv inside the docker image but on python3.7 with tensorflow 1.15.4 so to use coqui_stt_training.export with the TFLite API.

See my commit for commonvoice-fr

I'll make a PR for Coqui's image soon. See #2221

@wasertech
Copy link
Collaborator Author

Fixed with #2230

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant