Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfRangeError in tf.train.batch #4

Open
deepyury opened this issue Sep 3, 2018 · 1 comment
Open

OutOfRangeError in tf.train.batch #4

deepyury opened this issue Sep 3, 2018 · 1 comment

Comments

@deepyury
Copy link

deepyury commented Sep 3, 2018

I'm trying to train srresnet-mse using my own data set. Sometimes I get an error message. The first time it occurred between 0 and 100 eras, then between 100 and 200, then after 600 eras. In my data set there are about one hundred thousand images. I suspect that this is due to my data set. Can you help me understand what the problem is?

/opt/ds/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Logging results for this session in folder "results/srresnet-mse".
2018-09-03 12:22:04.150374: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-03 12:22:07.392768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:c1:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-09-03 12:22:07.392867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-09-03 12:22:07.811512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10415 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c1:00.0, compute capability: 6.1)
[0] Test: 0.4038988, Train: 0.5311046 [Set5] PSNR: 11.46, SSIM: 0.1051 [Set14] PSNR: 12.50, SSIM: 0.0841 [BSD100] PSNR: 13.13, SSIM: 0.1036
[100] Test: 0.2326521, Train: 0.3028869 [Set5] PSNR: 13.47, SSIM: 0.4380 [Set14] PSNR: 14.62, SSIM: 0.4203 [BSD100] PSNR: 15.39, SSIM: 0.4153
2018-09-03 12:23:46.013589: W tensorflow/core/kernels/queue_base.cc:277] _0_input_producer: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.026015: W tensorflow/core/kernels/queue_base.cc:277] _2_input_producer_1: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.026804: W tensorflow/core/kernels/queue_base.cc:277] _5_batch_2/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.027426: W tensorflow/core/kernels/queue_base.cc:277] _3_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.027743: W tensorflow/core/kernels/queue_base.cc:277] _4_input_producer_2: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
    return fn(*args)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
    target_list, status, run_metadata)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 134, in <module>
    main()
  File "train.py", line 121, in main
    batch_hr = sess.run(get_train_batch)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
    options, run_metadata)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]

Caused by op 'batch', defined at:
  File "train.py", line 134, in <module>
    main()
  File "train.py", line 68, in main
    get_train_batch, get_val_batch, get_eval_batch = build_inputs(args, sess)
  File "/home/ds/ykochnev/SRGAN-orig/utilities.py", line 55, in build_inputs
    get_train_batch = build_input_pipeline(train_filenames, batch_size=args.batch_size, img_size=args.image_size, random_crop=True)
  File "/home/ds/ykochnev/SRGAN-orig/utilities.py", line 36, in build_input_pipeline
    image_batch = tf.train.batch([image], batch_size=batch_size, num_threads=num_threads, capacity=10 * batch_size)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 989, in batch
    name=name)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 763, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 483, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2430, in _queue_dequeue_many_v2
    component_types=component_types, timeout_ms=timeout_ms, name=name)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
    op_def=op_def)
  File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

OutOfRangeError (see above for traceback): FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]

@trevor-m
Copy link
Owner

trevor-m commented Sep 8, 2018

This error happened to me when there was a corrupted or invalid image in the dataset. I would create a simple script to iterate over your dataset and try to load all the images to find out which one is causing the problem.

I havent found a way to have tensorflow simply ignore the images which fail to load - if anyone knows that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants