OuteTTS

🤗 Hugging Face | 💬 Discord | 𝕏 X (Twitter) | 🌐 Website | 📰 Blog

OuteTTS is an experimental text-to-speech model that uses a pure language modeling approach to generate speech, without architectural changes to the foundation model itself.

Compatibility

OuteTTS supports the following backends:

Backend
Hugging Face Transformers
GGUF llama.cpp
ExLlamaV2
Transformers.js

Installation

Python

pip install outetts

Important:

For GGUF support, install llama-cpp-python manually. Installation Guide
For EXL2 support, install exllamav2 manually. Installation Guide

Node.js / Browser

npm i outetts

Usage

Interfaces

outetts package provide two interfaces for OuteTTS with support for different models:

Interface	Supported Models	Documentation
Interface v1	OuteTTS-0.2, OuteTTS-0.1	View Documentation
Interface v2	OuteTTS-0.3	View Documentation

Generation Performance: The model performs best with 30-second generation batches. This window is reduced based on the length of your speaker samples. For example, if the speaker reference sample is 10 seconds, the effective window becomes approximately 20 seconds.

Speaker Profile Recommendations

To achieve the best results when creating a speaker profile, consider the following recommendations:

Audio Clip Duration:
- Use an audio clip of around 10 seconds.
- This duration provides sufficient data for the model to learn the speaker's characteristics while keeping the input manageable.
Audio Quality:
- Ensure the audio is clear and noise-free. Background noise or distortions can reduce the model's ability to extract accurate voice features.
Speaker Familiarity:
- The model performs best with voices that are similar to those seen during training. Using a voice that is significantly different from typical training samples (e.g., unique accents, rare vocal characteristics) might result in inaccurate replication.
- In such cases, you may need to fine-tune the model specifically on your target speaker's voice to achieve a better representation.
Parameter Adjustments:
- Adjust parameters like temperature in the generate function to refine the expressive quality and consistency of the synthesized voice.

Credits

WavTokenizer: GitHub Repository
- decoder and encoder folder files are from this repository
CTC Forced Alignment: PyTorch Tutorial
Uroman: GitHub Repository
- "This project uses the universal romanizer software 'uroman' written by Ulf Hermjakob, USC Information Sciences Institute (2015-2020)".
mecab-python3 GitHub Repository

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
docs		docs
examples/v1		examples/v1
outetts.js		outetts.js
outetts		outetts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OuteTTS

Compatibility

Installation

Python

Node.js / Browser

Usage

Interfaces

Speaker Profile Recommendations

Credits

About

Releases 4

Packages

Contributors 4

Languages

License

edwko/OuteTTS

Folders and files

Latest commit

History

Repository files navigation

OuteTTS

Compatibility

Installation

Python

Node.js / Browser

Usage

Interfaces

Speaker Profile Recommendations

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages