OutOfRangeError in tf.train.batch #4

deepyury · 2018-09-03T09:43:28Z

I'm trying to train srresnet-mse using my own data set. Sometimes I get an error message. The first time it occurred between 0 and 100 eras, then between 100 and 200, then after 600 eras. In my data set there are about one hundred thousand images. I suspect that this is due to my data set. Can you help me understand what the problem is?

/opt/ds/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Logging results for this session in folder "results/srresnet-mse".
2018-09-03 12:22:04.150374: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-03 12:22:07.392768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:c1:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-09-03 12:22:07.392867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-09-03 12:22:07.811512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10415 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c1:00.0, compute capability: 6.1)
[0] Test: 0.4038988, Train: 0.5311046 [Set5] PSNR: 11.46, SSIM: 0.1051 [Set14] PSNR: 12.50, SSIM: 0.0841 [BSD100] PSNR: 13.13, SSIM: 0.1036
[100] Test: 0.2326521, Train: 0.3028869 [Set5] PSNR: 13.47, SSIM: 0.4380 [Set14] PSNR: 14.62, SSIM: 0.4203 [BSD100] PSNR: 15.39, SSIM: 0.4153
2018-09-03 12:23:46.013589: W tensorflow/core/kernels/queue_base.cc:277] _0_input_producer: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.026015: W tensorflow/core/kernels/queue_base.cc:277] _2_input_producer_1: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.026804: W tensorflow/core/kernels/queue_base.cc:277] _5_batch_2/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.027426: W tensorflow/core/kernels/queue_base.cc:277] _3_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.027743: W tensorflow/core/kernels/queue_base.cc:277] _4_input_producer_2: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
    return fn(*args)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
    target_list, status, run_metadata)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 134, in <module>
    main()
  File "train.py", line 121, in main
    batch_hr = sess.run(get_train_batch)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
    options, run_metadata)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]

Caused by op 'batch', defined at:
  File "train.py", line 134, in <module>
    main()
  File "train.py", line 68, in main
    get_train_batch, get_val_batch, get_eval_batch = build_inputs(args, sess)
  File "/home/ds/ykochnev/SRGAN-orig/utilities.py", line 55, in build_inputs
    get_train_batch = build_input_pipeline(train_filenames, batch_size=args.batch_size, img_size=args.image_size, random_crop=True)
  File "/home/ds/ykochnev/SRGAN-orig/utilities.py", line 36, in build_input_pipeline
    image_batch = tf.train.batch([image], batch_size=batch_size, num_threads=num_threads, capacity=10 * batch_size)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 989, in batch
    name=name)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 763, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 483, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2430, in _queue_dequeue_many_v2
    component_types=component_types, timeout_ms=timeout_ms, name=name)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
    op_def=op_def)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

OutOfRangeError (see above for traceback): FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]

The text was updated successfully, but these errors were encountered:

trevor-m · 2018-09-08T06:20:40Z

This error happened to me when there was a corrupted or invalid image in the dataset. I would create a simple script to iterate over your dataset and try to load all the images to find out which one is causing the problem.

I havent found a way to have tensorflow simply ignore the images which fail to load - if anyone knows that would be great.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OutOfRangeError in tf.train.batch #4

OutOfRangeError in tf.train.batch #4

deepyury commented Sep 3, 2018

trevor-m commented Sep 8, 2018

OutOfRangeError in tf.train.batch #4

OutOfRangeError in tf.train.batch #4

Comments

deepyury commented Sep 3, 2018

trevor-m commented Sep 8, 2018