Skip to content

This repository contains the code to reproduce the core results from the paper "Scalable Factorized Hierarchical Variational Autoencoders"

Notifications You must be signed in to change notification settings

wnhsu/ScalableFHVAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scalable Factorized Hierarchical Variational Autoencoders

This repository contains (refactored) codes to reproduce the core results from the two papers:

Previous version of the codes can be found here

If you find the code useful, please cite

@inproceedings{hsu2017learning,
  title={Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data},
  author={Hsu, Wei-Ning and Zhang, Yu and Glass, James},
  booktitle={Advances in Neural Information Processing Systems},
  year={2017},
}
@article{hsu2018scalable,
  title={Scalable Factorized Hierarchical Variational Autoencoder Training},
  author={Hsu, Wei-Ning and Glass, James},
  journal={arXiv preprint arXiv:1804.03201},
  year={2018},
  arxiv={1804.03201},
}

Dependencies

This project uses Python 2.7.6. Before running the code, you have to install

The former 9 dependencies can be installed using pip by running

pip install -r requirements.txt

The last one requires Kaldi before a specific commit (d1e1e3b). If you don't have Kaldi before that version, you can install both Kaldi and Kaldi-Python by running

make all

Getting Started

Main source codes can be found in ./fhvae/. ./scripts contains runable python scripts. Example scripts for preprocessing are in ./examples/.

Two dataset formats are allowed: Kaldi and Numpy. Dataset should be stored in ./datasets/<dataset_name>/<set_name>/, where <set_name> refers to {train,dev,test}. Each set folder should contain a feats.scp and a len.scp. *.scp files follow Kaldi's script-file format, where each line is:

sequence-id value

The value for len.scp is an integer denoting the feature sequence length, and the value for feats.scp is *.npy (Numpy format) or *.ark:<offset> (Kaldi format).

Such files can be prepared with ./scripts/preprocess/prepare_kaldi_data.py (Kaldi format) or ./scripts/preprocess/prepare_numpy_data.py (Numpy format), given a wav.scp file.

Before running any codes, source the environment script first to update $PYTHONPATH:

. ./env.sh

Preprocessing

We now provide numpy preprocessing recipes for TIMIT and LibriSpeech from a raw data directory

python ./examples/prepare_timit_numpy <TIMIT_DIR>	# TIMIT
python ./examples/prepare_librispeech_numpy <LIBRISPEECH_DIR> # LibriSpeech

use -h to see more options

Training with Hierarchical Sampling

python ./scripts/train/run_hs_train.py --dataset=timit_np_fbank --is_numpy --nmu2=2000

Experiments will be saved to ./exp/timit_np_fbank/<exp_name>.

Training without Hierarchical Sampling (original FHVAE training)

python ./scripts/train/run_train.py --dataset=timit_np_fbank --is_numpy

Experiments will be saved to ./exp/timit_np_fbank/<exp_name>.

Evaluation

python scripts/eval/run_eval.py ./exp/timit_np_fbank/<exp_name> --seqlist=./misc/timit_eval.txt

Use --seqlist to specify which sequences to use for qualitative evaluation. Results with be saved to ./exp/timit_np_fbank/<exp_name>/img.

About

This repository contains the code to reproduce the core results from the paper "Scalable Factorized Hierarchical Variational Autoencoders"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages