Replies: 5 comments
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> abdullah.tayyab |
Beta Was this translation helpful? Give feedback.
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> abdullah.tayyab |
Beta Was this translation helpful? Give feedback.
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> abdullah.tayyab
[December 15, 2020, 1:11am]
Hi there,
I also posted on Testing for correctness of the
samples
but I think my issue warrants a separate topic so I can provide full
context of what I am trying to achieve.
I am trying to create an ASR system for Urdu (native Pakistani language)
using the Ubuntu 16.04 Deep Learning AMI from AWS. I have installed
DeepSpeech (0.9.2) in a virtualenv as dictated in the documentation. I
have been testing with one data source using various configurations and
train/test/dev files have worked fine with that one data source. I have
assembled various data sources with decent transcriptions to expand the
data set.
I have generated a separate scoring file for Urdu as pointed out in
multiple topics in discourse. The command I am using the execute is: slash
python3 DeepSpeech.py --drop_source_layers 1 --alphabet_config_path /$HOME/Uploads/UrduAlphabet_newscrawl2.txt --checkpoint_dir /$HOME/DeepSpeech/dataset/trained_load_checkpoint --train_files /$HOME/Uploads/trainbusiness.csv --dev_files /$HOME/Uploads/devbusiness.csv --test_files /$HOME/Uploads/testbusiness.csv --epochs 2 --train_batch_size 32 --export_dir /$HOME/DeepSpeech/dataset/urdu_trained --export_file_name urdu --test_batch_size 12 --learning_rate 0.00001 --reduce_lr_on_plateau true --scorer /$HOME/Uploads/kenlm.scorer
Here comes the interesting part... this works perfectly when I execute
this separately for all the different data sources. I have been
generating training, test, and dev files for each data source and there
were no issues when I used those files. I get the exception below when I
try to combine the csv files and run the whole data set together.
Obviously, I want to do that so there is more data and I can execute the
whole data set for a high number of epochs.
I Loading best validating checkpoint from //home/ubuntu/DeepSpeech/dataset/trained_load_checkpoint/best_dev-150
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I Initializing variable: layer_6/bias
I Initializing variable: layer_6/bias/Adam
I Initializing variable: layer_6/bias/Adam_1
I Initializing variable: layer_6/weights
I Initializing variable: layer_6/weights/Adam
I Initializing variable: layer_6/weights/Adam_1
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:04 | Steps: 1 | Loss: 15.989467 Traceback (most recent call last):
File 'DeepSpeech.py', line 12, in
ds_train.run_script()
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/train.py', line 976, in run_script
absl.app.run(main)
File '/home/ubuntu/tmp/deepspeech-venv/lib/python3.7/site-packages/absl/app.py', line 303, in run
_run_main(main, args)
File '/home/ubuntu/tmp/deepspeech-venv/lib/python3.7/site-packages/absl/app.py', line 251, in _run_main
sys.exit(main(argv))
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/train.py', line 948, in main
train()
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/train.py', line 605, in train
train_loss, _ = run_set('train', epoch, train_init_op)
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/train.py', line 571, in run_set
exception_box.raise_if_set()
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/util/helpers.py', line 123, in raise_if_set
raise exception # pylint: disable = raising-bad-type
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/util/helpers.py', line 131, in do_iterate
yield from iterable()
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/util/feeding.py', line 114, in generate_values
for sample_index, sample in enumerate(samples):
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/util/augmentations.py', line 221, in apply_sample_augmentations
yield from pool.imap(_augment_sample, timed_samples())
File '/home/ubuntu/DeepSpeech/training/deepspeech_training/util/helpers.py', line 102, in imap
for obj in self.pool.imap(fun, self._limit(it)):
File '/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/pool.py', line 748, in next
raise value
EOFError
This is the exception when I have executed the same command on two data
sets and try to improve the training by adding one more data set. All
.wav files have been converted to mono and 16kHz.
I have also used the csv_combiner from
https://github.com/dabinat/deepspeech-tools thinking that my code
wasn't combining them correctly.
I have shared both sources, combined csv files and separate csv files
here. slash
combinedrun.zip
(100.4 KB)
separaterun.zip
(86.1 KB)
Can someone please point me in the right direction?
Thank you!
[This is an archived TTS discussion thread from discourse.mozilla.org/t/eoferror-when-training-multiple-files]
Beta Was this translation helpful? Give feedback.
All reactions