Replies: 9 comments
-
>>> lissyx |
Beta Was this translation helpful? Give feedback.
-
>>> rajpuneet.sandhu |
Beta Was this translation helpful? Give feedback.
-
>>> lissyx |
Beta Was this translation helpful? Give feedback.
-
>>> rajpuneet.sandhu |
Beta Was this translation helpful? Give feedback.
-
>>> lissyx |
Beta Was this translation helpful? Give feedback.
-
>>> rajpuneet.sandhu |
Beta Was this translation helpful? Give feedback.
-
>>> lissyx |
Beta Was this translation helpful? Give feedback.
-
>>> rajpuneet.sandhu |
Beta Was this translation helpful? Give feedback.
-
>>> rajpuneet.sandhu |
Beta Was this translation helpful? Give feedback.
-
>>> rajpuneet.sandhu
[December 10, 2020, 10:24pm]
I ran the following command and seems to be some cudnn issue which is
strange since I used the Docker.train file provided as is. Am I missing
something here? slash
python3 DeepSpeech.py slash
/datasets/deepspeech_wakeword_dataset/wakeword-train-other-accents.csv, slash
/datasets/deepspeech_wakeword_dataset/wakeword-train.csv, slash
/datasets/india_portal_2may2019-train.csv, slash
/datasets/india_portal_2to9may2019-train.csv, slash
/datasets/india_portal_9to19may2019-train.csv, slash
/datasets/india_portal_19to24may2019-train.csv, slash
/datasets/brazil_portal_20to26june2019-wakeword-train.csv, slash
/datasets/brazil_portal_26juneto3july2019-wakeword-train.csv, slash
/datasets/japan_portal_3july2019-wakeword-train.csv, slash
/datasets/mixed_portal_backups_14_16_17_18_19_visteon_wakeword_dataset-train.csv, slash
/datasets/alexa-train.csv, slash
/datasets/alexa-polly-train.csv, slash
/datasets/alexa-sns.csv, slash
/datasets/india_portal_ww_data_04282020/custom_train.csv, slash
/datasets/india_portal_ww_data_05042020/custom_train.csv, slash
/datasets/india_portal_ww_data_05222020/custom_train.csv, slash
/datasets/india_portal_ww_data_augmented_04282020/custom_train.csv, slash
/datasets/india_portal_ww_data_augmented_04282020/custom_test.csv, slash
/datasets/india_portal_ww_data_augmented_05042020/custom_train.csv, slash
/datasets/india_portal_ww_data_augmented_05042020/custom_test.csv, slash
/datasets/ww_gtts_data_google_siri/custom_train.csv, slash
/datasets/ww_gtts_data_google_siri/custom_dev.csv, slash
/datasets/ww_polly_data_google_siri/custom_train.csv, slash
/datasets/ww_polly_data_google_siri/custom_test.csv slash
/datasets/india_portal_2may2019-dev.csv, slash
/datasets/india_portal_2to9may2019-dev.csv, slash
/datasets/india_portal_9to19may2019-dev.csv, slash
/datasets/india_portal_19to24may2019-dev.csv, slash
/datasets/brazil_portal_20to26june2019-wakeword-dev.csv, slash
/datasets/brazil_portal_26juneto3july2019-wakeword-dev.csv, slash
/datasets/mixed_portal_backups_14_16_17_18_19_visteon_wakeword_dataset-dev.csv, slash
/datasets/alexa-dev.csv, slash
/datasets/india_portal_ww_data_augmented_04282020/custom_dev.csv, slash
/datasets/india_portal_ww_data_augmented_05042020/custom_dev.csv, slash
/datasets/india_portal_ww_data_05222020/custom_dev.csv, slash
/datasets/ww_gtts_data_google_siri/custom_dev.csv, slash
/datasets/ww_polly_data_google_siri/custom_dev.csv, slash
/datasets/india_portal_ww_data_augmented_04282020/custom_dev.csv, slash
/datasets/india_portal_ww_data_augmented_05042020/custom_dev.csv slash
/datasets/alexa-train.csv, slash
/datasets/alexa-polly-train.csv, slash
/datasets/alexa-sns.csv, slash
/datasets/alexa-dev.csv, slash
/datasets/india_portal_ww_data_04282020/custom_train.csv, slash
/datasets/india_portal_ww_data_05042020/custom_train.csv, slash
/datasets/india_portal_ww_data_04282020/custom_dev.csv, slash
/datasets/india_portal_ww_data_05042020/custom_dev.csv, slash
/datasets/india_portal_ww_data_04282020/custom_test.csv, slash
/datasets/india_portal_ww_data_05042020/custom_test.csv, slash
/datasets/india_portal_ww_data_augmented_04282020/custom_train.csv, slash
/datasets/india_portal_ww_data_augmented_04282020/custom_dev.csv, slash
/datasets/india_portal_ww_data_augmented_04282020/custom_test.csv, slash
/datasets/india_portal_ww_data_augmented_05042020/custom_train.csv, slash
/datasets/india_portal_ww_data_augmented_05042020/custom_dev.csv, slash
/datasets/india_portal_ww_data_augmented_05042020/custom_test.csv
checkpoints
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
I Training epoch 0...
Traceback (most recent call last):
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py', line 1365, in _do_call
return fn(*args)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py', line 1350, in _run_fn
target_list, run_metadata)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py', line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node tower_0/conv1d}}]]
[[concat/concat/_99]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node tower_0/conv1d}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File 'DeepSpeech.py', line 12, in
ds_train.run_script()
File '/DeepSpeech/training/deepspeech_training/train.py', line 976, in run_script
absl.app.run(main)
File '/usr/local/lib/python3.6/dist-packages/absl/app.py', line 300, in run
_run_main(main, args)
File '/usr/local/lib/python3.6/dist-packages/absl/app.py', line 251, in _run_main
sys.exit(main(argv))
File '/DeepSpeech/training/deepspeech_training/train.py', line 948, in main
train()
File '/DeepSpeech/training/deepspeech_training/train.py', line 605, in train
train_loss, _ = run_set('train', epoch, train_init_op)
File '/DeepSpeech/training/deepspeech_training/train.py', line 570, in run_set
feed_dict=feed_dict)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py', line 956, in run
run_metadata_ptr)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py', line 1180, in _run
feed_dict_tensor, options, run_metadata)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py', line 1359, in _do_run
run_metadata)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py', line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node tower_0/conv1d (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[concat/concat/_99]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node tower_0/conv1d (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'tower_0/conv1d':
File 'DeepSpeech.py', line 12, in
ds_train.run_script()
File '/DeepSpeech/training/deepspeech_training/train.py', line 976, in run_script
absl.app.run(main)
File '/usr/local/lib/python3.6/dist-packages/absl/app.py', line 300, in run
_run_main(main, args)
File '/usr/local/lib/python3.6/dist-packages/absl/app.py', line 251, in _run_main
sys.exit(main(argv))
File '/DeepSpeech/training/deepspeech_training/train.py', line 948, in main
train()
File '/DeepSpeech/training/deepspeech_training/train.py', line 483, in train
gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
File '/DeepSpeech/training/deepspeech_training/train.py', line 316, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
File '/DeepSpeech/training/deepspeech_training/train.py', line 243, in calculate_mean_edit_distance_and_loss
logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
File '/DeepSpeech/training/deepspeech_training/train.py', line 171, in create_model
batch_x = create_overlapping_windows(batch_x)
File '/DeepSpeech/training/deepspeech_training/train.py', line 69, in create_overlapping_windows
batch_x = tf.nn.conv1d(input=batch_x, filters=eye_filter, stride=1, padding='SAME')
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py', line 574, in new_func
return func(*args, **kwargs)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py', line 574, in new_func
return func(*args, **kwargs)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py', line 1681, in conv1d
name=name)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py', line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py', line 794, in _apply_op_helper
op_def=op_def)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py', line 507, in new_func
return func(*args, **kwargs)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py', line 3357, in create_op
attrs, op_def, compute_device)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py', line 3426, in _create_op_internal
op_def=op_def)
File '/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py', line 1748, in init
self._traceback = tf_stack.extract_stack()
[This is an archived TTS discussion thread from discourse.mozilla.org/t/error-on-starting-training-inside-docker-container-for-deepspeech-0-9-1using-gpu]
Beta Was this translation helpful? Give feedback.
All reactions