Hugging Face Optimum

🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

Installation

🤗 Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:

Accelerator	Installation
ONNX Runtime	`pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]`
Intel Neural Compressor	`pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`
OpenVINO	`pip install --upgrade --upgrade-strategy eager optimum[openvino]`
NVIDIA TensorRT-LLM	`docker run -it --gpus all --ipc host huggingface/optimum-nvidia`
AMD Instinct GPUs and Ryzen AI NPU	`pip install --upgrade --upgrade-strategy eager optimum[amd]`
AWS Trainum & Inferentia	`pip install --upgrade --upgrade-strategy eager optimum[neuronx]`
Habana Gaudi Processor (HPU)	`pip install --upgrade --upgrade-strategy eager optimum[habana]`
FuriosaAI	`pip install --upgrade --upgrade-strategy eager optimum[furiosa]`

The --upgrade --upgrade-strategy eager option is needed to ensure the different packages are upgraded to the latest possible version.

To install from source:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, append optimum[accelerator_type] to the above command:

python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Accelerated Inference

🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:

ONNX / ONNX Runtime
TensorFlow Lite
OpenVINO
Habana first-gen Gaudi / Gaudi2, more details here
AWS Inferentia 2 / Inferentia 1, more details here
NVIDIA TensorRT-LLM , more details here

The export and optimizations can be done both programmatically and with a command line.

ONNX + ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters,onnxruntime]

It is possible to export 🤗 Transformers and Diffusers models to the ONNX format and perform graph optimization as well as quantization easily.

For more information on the ONNX export, please check the documentation.

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using ONNX Runtime in the backend.

More details on how to run ONNX models with ORTModelForXXX classes here.

TensorFlow Lite

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters-tf]

Just as for ONNX, it is possible to export models to TensorFlow Lite and quantize them. You can find more information in our documentation.

Intel (OpenVINO + Neural Compressor + IPEX)

Before you begin, make sure you have all the necessary libraries installed.

You can find more information on the different integration in our documentation and in the examples of optimum-intel.

Quanto

Quanto is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the optimum-cli.

You can see more details and examples in the Quanto repository.

Accelerated training

🤗 Optimum provides wrappers around the original 🤗 Transformers Trainer to enable training on powerful hardware easily. We support many providers:

Habana's Gaudi processors
AWS Trainium instances, check here
ONNX Runtime (optimized for GPUs)

Habana

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade --upgrade-strategy eager optimum[habana]

You can find examples in the documentation and in the examples.

ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[onnxruntime-training]

You can find examples in the documentation and in the examples.

Name		Name	Last commit message	Last commit date
Latest commit History 1,182 Commits
.github		.github
docs		docs
examples/onnxruntime		examples/onnxruntime
notebooks		notebooks
optimum		optimum
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hugging Face Optimum

Installation

Accelerated Inference

ONNX + ONNX Runtime

TensorFlow Lite

Intel (OpenVINO + Neural Compressor + IPEX)

Quanto

Accelerated training

Habana

ONNX Runtime

About

Releases 68

Packages

Used by 4.4k

Contributors 139

Languages

License

huggingface/optimum

Folders and files

Latest commit

History

Repository files navigation

Hugging Face Optimum

Installation

Accelerated Inference

ONNX + ONNX Runtime

TensorFlow Lite

Intel (OpenVINO + Neural Compressor + IPEX)

Quanto

Accelerated training

Habana

ONNX Runtime

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 68

Packages 0

Used by 4.4k

Contributors 139

Languages

Packages