This repository contains (refactored) codes to reproduce the core results from the two papers:
- Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
- Scalable Factorized Hierarchical Variational Autoencoder Training
Previous version of the codes can be found here
If you find the code useful, please cite
@inproceedings{hsu2017learning,
title={Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data},
author={Hsu, Wei-Ning and Zhang, Yu and Glass, James},
booktitle={Advances in Neural Information Processing Systems},
year={2017},
}
@article{hsu2018scalable,
title={Scalable Factorized Hierarchical Variational Autoencoder Training},
author={Hsu, Wei-Ning and Glass, James},
journal={arXiv preprint arXiv:1804.03201},
year={2018},
arxiv={1804.03201},
}
This project uses Python 2.7.6. Before running the code, you have to install
- Tensorflow 1.0
- Scipy
- Numpy
- CFFI
- Soundfile
- Matplotlib
- Scikit-Learn
- LibROSA
- SPHFile (optional, for preprocessing TIMIT raw data)
- Kaldi (optional)
- Kaldi-Python (optional)
The former 9 dependencies can be installed using pip by running
pip install -r requirements.txt
The last one requires Kaldi before a specific commit (d1e1e3b
). If you don't have
Kaldi before that version, you can install both Kaldi and Kaldi-Python by running
make all
Main source codes can be found in ./fhvae/
. ./scripts
contains runable python scripts.
Example scripts for preprocessing are in ./examples/
.
Two dataset formats are allowed: Kaldi and Numpy.
Dataset should be stored in ./datasets/<dataset_name>/<set_name>/
, where <set_name>
refers to {train,dev,test}
. Each set folder should contain a feats.scp
and a len.scp
.
*.scp
files follow Kaldi's script-file format, where
each line is:
sequence-id value
The value
for len.scp
is an integer denoting the feature sequence length,
and the value
for feats.scp
is *.npy
(Numpy format) or *.ark:<offset>
(Kaldi format).
Such files can be prepared with ./scripts/preprocess/prepare_kaldi_data.py
(Kaldi format)
or ./scripts/preprocess/prepare_numpy_data.py
(Numpy format),
given a wav.scp
file.
Before running any codes, source the environment script first to update $PYTHONPATH
:
. ./env.sh
We now provide numpy preprocessing recipes for TIMIT and LibriSpeech from a raw data directory
python ./examples/prepare_timit_numpy <TIMIT_DIR> # TIMIT
python ./examples/prepare_librispeech_numpy <LIBRISPEECH_DIR> # LibriSpeech
use -h
to see more options
python ./scripts/train/run_hs_train.py --dataset=timit_np_fbank --is_numpy --nmu2=2000
Experiments will be saved to ./exp/timit_np_fbank/<exp_name>
.
python ./scripts/train/run_train.py --dataset=timit_np_fbank --is_numpy
Experiments will be saved to ./exp/timit_np_fbank/<exp_name>
.
python scripts/eval/run_eval.py ./exp/timit_np_fbank/<exp_name> --seqlist=./misc/timit_eval.txt
Use --seqlist
to specify which sequences to use for qualitative evaluation.
Results with be saved to ./exp/timit_np_fbank/<exp_name>/img
.