Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use soundfile for mp3 decoding instead of torchaudio #5573

Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d7b0c3c
use soundfile for mp3 decoding instead of torchaudio
polinaeterna Feb 23, 2023
58e05e9
fix some tests
polinaeterna Feb 23, 2023
80a47c6
remove torch and torchaudio from library's requirements
polinaeterna Feb 23, 2023
0f6130c
refactor audio decoding, decode everything with soundfile
polinaeterna Feb 24, 2023
a1c4915
remove torchaudio latest test ci stage, remove libsndfile and sox bin…
polinaeterna Feb 24, 2023
5dc4331
get back torch test dependency
polinaeterna Feb 24, 2023
1ab2bf9
remove installing system audio dependencies (libsndfile, sox) from ci
polinaeterna Feb 27, 2023
1b6978c
remove checks for libsndfile in tests since it's bundeled in python l…
polinaeterna Feb 27, 2023
427951d
remove instructions about installing via package manager since it's m…
polinaeterna Feb 27, 2023
365db5c
pin soundfile version to the latest
polinaeterna Feb 27, 2023
13891ef
update documentation
polinaeterna Feb 27, 2023
c4cbdb3
fix setup
polinaeterna Feb 27, 2023
df53313
Update docs/source/installation.md
polinaeterna Feb 27, 2023
4a6b319
Merge branch 'huggingface:main' into remove-torchaudio-use-soundfile-…
polinaeterna Feb 27, 2023
85d441d
refactor decoding: move all the code under the main decode_example func
polinaeterna Feb 27, 2023
b6c3d4d
get audio format with os.path instead of string split
polinaeterna Feb 27, 2023
9838ef4
add module config variables for opus and mp3 support
polinaeterna Feb 27, 2023
985d467
apply steven's suggestion to installation docs
polinaeterna Feb 27, 2023
a5d040f
wrap torch.from_numpy in a func to avoid torch.from_numpy pickling error
polinaeterna Feb 28, 2023
9809db5
Apply suggestions from code review
polinaeterna Feb 28, 2023
b7562f4
fix code style
polinaeterna Feb 28, 2023
1598206
import xsplitext
polinaeterna Feb 28, 2023
cb7af00
Merge branch 'huggingface:main' into remove-torchaudio-use-soundfile-…
polinaeterna Feb 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 0 additions & 30 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,6 @@ jobs:
continue-on-error: ${{ matrix.test == 'integration' }}
runs-on: ${{ matrix.os }}
steps:
- name: Install OS dependencies
if: ${{ matrix.os == 'ubuntu-latest' }}
run: |
sudo apt-get -y update
sudo apt-get -y install libsndfile1 sox
- uses: actions/checkout@v3
with:
fetch-depth: 0
Expand Down Expand Up @@ -72,16 +67,6 @@ jobs:
- name: Test with pytest
run: |
python -m pytest -rfExX -m ${{ matrix.test }} -n 2 --dist loadfile -sv ./tests/
- name: Install dependencies to test torchaudio>=0.12 on Ubuntu
if: ${{ matrix.os == 'ubuntu-latest' }}
run: |
pip uninstall -y torchaudio torch
pip install "torchaudio>=0.12"
sudo apt-get -y install ffmpeg
- name: Test torchaudio>=0.12 on Ubuntu
if: ${{ matrix.os == 'ubuntu-latest' }}
run: |
python -m pytest -rfExX -m torchaudio_latest -n 2 --dist loadfile -sv ./tests/features/test_audio.py
test_py310:
needs: check_code_quality
Expand All @@ -93,11 +78,6 @@ jobs:
continue-on-error: false
runs-on: ${{ matrix.os }}
steps:
- name: Install OS dependencies
if: ${{ matrix.os == 'ubuntu-latest' }}
run: |
sudo apt-get -y update
sudo apt-get -y install libsndfile1 sox
- uses: actions/checkout@v3
with:
fetch-depth: 0
Expand All @@ -112,13 +92,3 @@ jobs:
- name: Test with pytest
run: |
python -m pytest -rfExX -m ${{ matrix.test }} -n 2 --dist loadfile -sv ./tests/
- name: Install dependencies to test torchaudio>=0.12 on Ubuntu
if: ${{ matrix.os == 'ubuntu-latest' }}
run: |
pip uninstall -y torchaudio torch
pip install "torchaudio>=0.12"
sudo apt-get -y install ffmpeg
- name: Test torchaudio>=0.12 on Ubuntu
if: ${{ matrix.os == 'ubuntu-latest' }}
run: |
python -m pytest -rfExX -m torchaudio_latest -n 2 --dist loadfile -sv ./tests/features/test_audio.py
2 changes: 1 addition & 1 deletion docs/source/audio_load.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Load audio data

You can load an audio dataset using the [`Audio`] feature that automatically decodes and resamples the audio files when you access the examples.
Audio decoding is based on `librosa` in general, and `torchaudio` for MP3.
Audio decoding is based on the [`soundfile`](https://github.com/bastibe/python-soundfile) python package, which uses the [`libsndfile`](https://github.com/libsndfile/libsndfile) C library under the hood.

## Installation

Expand Down
22 changes: 3 additions & 19 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,31 +67,15 @@ pip install datasets[audio]

<Tip warning={true}>

On Linux, non-Python dependency on `libsndfile` package must be installed manually, using your distribution package manager, for example:
To decode mp3 files, you need to have at least version 1.1.0 of the `libsndfile` system library. Usually, it's bundled with the python [`soundfile`](https://github.com/bastibe/python-soundfile) package, which is installed as an extra audio dependency for 🤗 Datasets.
For Linux, the required version of `libsndfile` is bundled with `soundfile` starting from version 0.12.0. You can run the following command to determine which version of `libsndfile` is being used by `soundfile`:

```bash
sudo apt-get install libsndfile1
python -c "import soundfile; print(soundfile.__libsndfile_version__)"
```

</Tip>

To support loading audio datasets containing MP3 files, users should also install [torchaudio](https://pytorch.org/audio/stable/index.html) to handle the audio data with high performance:

```bash
pip install 'torchaudio<0.12.0'
```

<Tip warning={true}>

torchaudio's `sox_io` [backend](https://pytorch.org/audio/stable/backend.html#) supports decoding MP3 files. Unfortunately, the `sox_io` backend is only available on Linux/macOS and isn't supported by Windows.

You need to install it using your distribution package manager, for example:

```bash
sudo apt-get install sox
```

</Tip>

## Vision

Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@
]

AUDIO_REQUIRE = [
"soundfile>=0.12.1",
mariosasko marked this conversation as resolved.
Show resolved Hide resolved
"librosa",
]

Expand Down Expand Up @@ -175,8 +176,7 @@
"tensorflow-macos; sys_platform == 'darwin' and platform_machine == 'arm64'",
"tiktoken;python_version>='3.8'",
"torch",
"torchaudio<0.12.0",
"soundfile",
"soundfile>=0.12.1",
"transformers",
"zstandard",
]
Expand Down
7 changes: 6 additions & 1 deletion src/datasets/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,12 @@

# Optional tools for feature decoding
PIL_AVAILABLE = importlib.util.find_spec("PIL") is not None

IS_OPUS_SUPPORTED = importlib.util.find_spec("soundfile") is not None and version.parse(
importlib.import_module("soundfile").__libsndfile_version__
) >= version.parse("1.0.31")
IS_MP3_SUPPORTED = importlib.util.find_spec("soundfile") is not None and version.parse(
importlib.import_module("soundfile").__libsndfile_version__
) >= version.parse("1.1.0")

# Optional compression tools
RARFILE_AVAILABLE = importlib.util.find_spec("rarfile") is not None
Expand Down
183 changes: 41 additions & 142 deletions src/datasets/features/audio.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
import os
import warnings
from dataclasses import dataclass, field
from io import BytesIO
from typing import TYPE_CHECKING, Any, ClassVar, Dict, Optional, Union

import numpy as np
import pyarrow as pa
from packaging import version

from .. import config
from ..download.streaming_download_manager import xopen
from ..download.streaming_download_manager import xopen, xsplitext
from ..table import array_cast
from ..utils.py_utils import no_op_if_value_is_null, string_to_dict

Expand Down Expand Up @@ -150,20 +148,47 @@ def decode_example(
path, file = (value["path"], BytesIO(value["bytes"])) if value["bytes"] is not None else (value["path"], None)
if path is None and file is None:
raise ValueError(f"An audio sample should have one of 'path' or 'bytes' but both are None in {value}.")
elif path is not None and path.endswith("mp3"):
array, sampling_rate = self._decode_mp3(file if file else path)
elif path is not None and path.endswith("opus"):
if file:
array, sampling_rate = self._decode_non_mp3_file_like(file, "opus")
else:
array, sampling_rate = self._decode_non_mp3_path_like(
path, "opus", token_per_repo_id=token_per_repo_id
)

try:
import librosa
import soundfile as sf
except ImportError as err:
raise ImportError("To support decoding audio files, please install 'librosa' and 'soundfile'.") from err

audio_format = xsplitext(path)[1][1:].lower() if path is not None else None
if not config.IS_OPUS_SUPPORTED and audio_format == "opus":
raise RuntimeError(
"Decoding 'opus' files requires system library 'libsndfile'>=1.0.31, "
'You can try to update `soundfile` python library: `pip install "soundfile>=0.12.1"`. '
)
elif not config.IS_MP3_SUPPORTED and audio_format == "mp3":
raise RuntimeError(
"Decoding 'mp3' files requires system library 'libsndfile'>=1.1.0, "
'You can try to update `soundfile` python library: `pip install "soundfile>=0.12.1"`. '
)

if file is None:
token_per_repo_id = token_per_repo_id or {}
source_url = path.split("::")[-1]
try:
repo_id = string_to_dict(source_url, config.HUB_DATASETS_URL)["repo_id"]
use_auth_token = token_per_repo_id[repo_id]
except (ValueError, KeyError):
use_auth_token = None

with xopen(path, "rb", use_auth_token=use_auth_token) as f:
array, sampling_rate = sf.read(f)

else:
if file:
array, sampling_rate = self._decode_non_mp3_file_like(file)
else:
array, sampling_rate = self._decode_non_mp3_path_like(path, token_per_repo_id=token_per_repo_id)
array, sampling_rate = sf.read(file)

array = array.T
if self.mono:
array = librosa.to_mono(array)
if self.sampling_rate and self.sampling_rate != sampling_rate:
array = librosa.resample(array, orig_sr=sampling_rate, target_sr=self.sampling_rate)
sampling_rate = self.sampling_rate

return {"path": path, "array": array, "sampling_rate": sampling_rate}

def flatten(self) -> Union["FeatureType", Dict[str, "FeatureType"]]:
Expand Down Expand Up @@ -242,129 +267,3 @@ def path_to_bytes(path):
)
storage = pa.StructArray.from_arrays([bytes_array, path_array], ["bytes", "path"], mask=bytes_array.is_null())
return array_cast(storage, self.pa_type)

def _decode_non_mp3_path_like(
self, path, format=None, token_per_repo_id: Optional[Dict[str, Union[str, bool, None]]] = None
):
try:
import librosa
except ImportError as err:
raise ImportError("To support decoding audio files, please install 'librosa'.") from err

token_per_repo_id = token_per_repo_id or {}
if format == "opus":
import soundfile

if version.parse(soundfile.__libsndfile_version__) < version.parse("1.0.30"):
raise RuntimeError(
"Decoding .opus files requires 'libsndfile'>=1.0.30, "
+ "it can be installed via conda: `conda install -c conda-forge libsndfile>=1.0.30`"
)
source_url = path.split("::")[-1]
try:
repo_id = string_to_dict(source_url, config.HUB_DATASETS_URL)["repo_id"]
use_auth_token = token_per_repo_id[repo_id]
except (ValueError, KeyError):
use_auth_token = None

with xopen(path, "rb", use_auth_token=use_auth_token) as f:
array, sampling_rate = librosa.load(f, sr=self.sampling_rate, mono=self.mono)
return array, sampling_rate

def _decode_non_mp3_file_like(self, file, format=None):
try:
import librosa
import soundfile as sf
except ImportError as err:
raise ImportError("To support decoding audio files, please install 'librosa' and 'soundfile'.") from err

if format == "opus":
if version.parse(sf.__libsndfile_version__) < version.parse("1.0.30"):
raise RuntimeError(
"Decoding .opus files requires 'libsndfile'>=1.0.30, "
+ 'it can be installed via conda: `conda install -c conda-forge "libsndfile>=1.0.30"`'
)
array, sampling_rate = sf.read(file)
array = array.T
if self.mono:
array = librosa.to_mono(array)
if self.sampling_rate and self.sampling_rate != sampling_rate:
array = librosa.resample(array, orig_sr=sampling_rate, target_sr=self.sampling_rate)
sampling_rate = self.sampling_rate
return array, sampling_rate

def _decode_mp3(self, path_or_file):
try:
import torchaudio
except ImportError as err:
raise ImportError("To support decoding 'mp3' audio files, please install 'torchaudio'.") from err
if version.parse(torchaudio.__version__) < version.parse("0.12.0"):
try:
torchaudio.set_audio_backend("sox_io")
except RuntimeError as err:
raise ImportError("To support decoding 'mp3' audio files, please install 'sox'.") from err
array, sampling_rate = self._decode_mp3_torchaudio(path_or_file)
else:
try: # try torchaudio anyway because sometimes it works (depending on the os and os packages installed)
array, sampling_rate = self._decode_mp3_torchaudio(path_or_file)
except RuntimeError:
global _ffmpeg_warned
if not _ffmpeg_warned:
warnings.warn(
"\nTo support 'mp3' decoding with `torchaudio>=0.12.0`, make sure you have `ffmpeg` system package with at least version 4 installed. "
"Alternatively, you can downgrade `torchaudio`:\n\n"
"\tpip install \"torchaudio<0.12\".\n\nOtherwise 'mp3' files will be decoded with `librosa`."
)
_ffmpeg_warned = True
try:
# flake8: noqa
import librosa
except ImportError as err:
raise ImportError(
"\nTo support 'mp3' decoding with `torchaudio>=0.12.0`, make sure you have `ffmpeg` system package with at least version 4 installed. "
"\tpip install \"torchaudio<0.12\".\n\nTo decode 'mp3' files without `torchaudio`, please install `librosa`:\n\n"
"\tpip install librosa\n\nNote that decoding might be extremely slow in that case."
) from err
# try to decode with librosa for torchaudio>=0.12.0 as a workaround
global _librosa_warned
if not _librosa_warned:
warnings.warn("Decoding mp3 with `librosa` instead of `torchaudio`, decoding might be slow.")
_librosa_warned = True
try:
array, sampling_rate = self._decode_mp3_librosa(path_or_file)
except RuntimeError as err:
raise RuntimeError(
"Decoding of 'mp3' failed, probably because of streaming mode "
"(`librosa` cannot decode 'mp3' file-like objects, only path-like)."
) from err

return array, sampling_rate

def _decode_mp3_torchaudio(self, path_or_file):
import torchaudio
import torchaudio.transforms as T

array, sampling_rate = torchaudio.load(path_or_file, format="mp3")
if self.sampling_rate and self.sampling_rate != sampling_rate:
if not hasattr(self, "_resampler") or self._resampler.orig_freq != sampling_rate:
self._resampler = T.Resample(sampling_rate, self.sampling_rate)
array = self._resampler(array)
sampling_rate = self.sampling_rate
array = array.numpy()
if self.mono:
array = array.mean(axis=0)
return array, sampling_rate

def _decode_mp3_librosa(self, path_or_file):
import librosa

global _audioread_warned

with warnings.catch_warnings():
if _audioread_warned:
warnings.filterwarnings("ignore", "pysoundfile failed.+?", UserWarning, module=librosa.__name__)
else:
_audioread_warned = True
array, sampling_rate = librosa.load(path_or_file, mono=self.mono, sr=self.sampling_rate)

return array, sampling_rate
6 changes: 5 additions & 1 deletion src/datasets/utils/py_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,9 +640,13 @@ def _save_regex(pickler, obj):

@pklregister(obj_type)
def _save_tensor(pickler, obj):
# `torch.from_numpy` is not picklable in `torch>=1.11.0`
def _create_tensor(np_array):
return torch.from_numpy(np_array)

dill_log(pickler, f"To: {obj}")
args = (obj.detach().cpu().numpy(),)
pickler.save_reduce(torch.from_numpy, args, obj=obj)
pickler.save_reduce(_create_tensor, args, obj=obj)
dill_log(pickler, "# To")
return

Expand Down
Loading