Build | |
---|---|
Docs | |
Package |
This REST API is built on top of Mozilla's DeepSpeech. It is written based on examples provided by Mozilla. It accepts HTTP methods such as GET and POST as well as WebSocket. To perform transcription using HTTP methods is appropriate for relatively short audio files while the WebSocket can be used even for longer audio recordings.
Below instructions are for Unix/OS X, they will have to be changed to be able to run the code on Windows.
- Clone the repository to your local machine and change directory to
deepspeech-rest-api
$ git clone https://github.com/fabricekwizera/deepspeech-rest-api.git
$ cd deepspeech-rest-api
2. Create a virtual environment and activate it (assuming that it is installed your machine) and install the project in editable mode (locally).
$ python -m venv venv
$ source venv/bin/activate
$ python -m pip install -U pip==21.0.0 wheel
$ python -m pip install --editable .
- Download the model and the scorer. For English model and scorer, follow below links
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm \
-O deepspeech_model.pbmm
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer \
-O deepspeech_model.scorer
For other languages, you can place the two files in the current working directory under the names deepspeech_model.pbmm
for the
model and deepspeech_model.scorer
for the scorer.
- Migrations are done using Alembic
Below steps can be followed to make the migrations up and running. The choice of the RDBMS lies with the developer, here I will use PostgresSQL
- With use of a valid username and password, create a database to hold all the relationships.
- Update the line
driver://user:pass@localhost/dbname
in the file.env
with the valid driver name, username and password. Driver ispostgresql
in this case. - Initialize the migrations with
$ alembic init migrations
- Alembic creates an
alembic.ini
file into the current directory and amigrations
directory. The .ini file needs to be changed at the linesqlalchemy.url = driver://user:pass@localhost/dbname
with the values from the .env file. - Change directory to migrations directory and update
env.py
file by importBase
from models and updating the metadata.
from logging.config import fileConfig
from sqlalchemy import engine_from_config
from sqlalchemy import pool
from alembic import context
from app.models import Base
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
# Interpret the config file for Python logging.
# This line sets up loggers basically.
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = Base.metadata
- Create migration script and apply it to database
$ alembic revision -m "Create users table" --autogenerate
$ alembic upgrade head
After this last step, the users
table should be created in the database.
- Running the server
$ python run.py
Register a new user and request a new JWT token to access the API
$ curl -X POST \
http://0.0.0.0:8000/api/v1/users/register \
-H 'Content-Type: application/json' \
-d '{
"username": "forrestgump",
"email": "[email protected]",
"password": "yourpassword"
}'
API response
{
"id": 15,
"username": "forrest",
"email": "[email protected]",
"created_at": "2021-09-06T07:07:46.989193Z",
"modified_at": "2021-09-06T07:07:46.989207Z"
}
To generate a JWT token to access the API
$ curl -X POST \
http://0.0.0.0:8000/api/v1/token \
-H 'Content-Type: application/json' \
-d '{
"username": "forrestgump",
"password": "yourpassword"
}'
If both steps are done correctly, you should get a token in below format
{
"access_token": "JWT_token"
}
With this JWT_token
, you have access to different endpoints of the API.
Change directory to audio
and use the WAV files provided for testing.
Note the usage of hot-words and their boosts in the request.
- STT the HTTP way
cURL
$ curl -X POST \
http://0.0.0.0:8000/api/v1/stt/http \
-H 'Authorization: Bearer JWT_token' \
-F '[email protected]' \
-F 'paris=-1000' \
-F 'power=1000' \
-F 'parents=-1000'
python
import requests
jwt_token = 'JWT_token'
headers = {'Authorization': 'Bearer ' + jwt_token}
url = 'http://0.0.0.0:8000/api/v1/stt/http'
hot_words = {'paris': -1000, 'power': 1000, 'parents': -1000}
audio_filename = 'audio/8455-210777-0068.wav'
audio = [('audio', open(audio_filename, 'rb'))]
response = requests.post(url, data=hot_words, files=audio, headers=headers)
print(response.json())
- STT the WebSocket way (simple test)
WebSockets don't support curl
. To take advantage of this feature, you will have to write a web app to send request to the endpoint ws://0.0.0.0:8000/api/v1/stt/ws
(in case the server is running at 0.0.0.0:8000
).
Below command can be used to check if the WebSocket is running.
$ python client_audio_file_stt.py
In the both cases (HTTP and WebSocket), you should get a result in below format.
{
"message": "experience proves this",
"time": 1.4718825020026998
}
Below command can be used to stream speech using the WebSocket on the endpoint ws://0.0.0.0:8000/api/v1/stt/mic
. Also in this case, the web app well need to implement
something similar (or far better) to the one in below code.
$ python client_mic_stream_stt.py
Now you can stream speech to your server and see the result in the client's shell. The implementation of VAD (Voice Activity Detection) will be released pretty soon.
Licensed under the Mozilla Public License 2.0