You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When loading a large dataset (>1000GB) from S3 I run into the following error:
Traceback (most recent call last):
File "/home/alp/.local/lib/python3.10/site-packages/s3fs/core.py", line 113, in _error_wrapper
return await func(*args, **kwargs)
File "/home/alp/.local/lib/python3.10/site-packages/aiobotocore/client.py", line 383, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (RequestTimeTooSkewed) when calling the GetObject operation: The difference between the request time and the current time is too large.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/alp/phoneme-classification.monorepo/aws_sagemaker/data_processing/inspect_final_dataset.py", line 13, in <module>
dataset = load_from_disk("s3://speech-recognition-processed-data/whisper/de/train_data/", storage_options=storage_options)
File "/home/alp/.local/lib/python3.10/site-packages/datasets/load.py", line 1902, in load_from_disk
return Dataset.load_from_disk(dataset_path, keep_in_memory=keep_in_memory, storage_options=storage_options)
File "/home/alp/.local/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 1686, in load_from_disk
fs.download(src_dataset_path, [dest_dataset_path.as](http://dest_dataset_path.as/)_posix(), recursive=True)
File "/home/alp/.local/lib/python3.10/site-packages/fsspec/spec.py", line 1480, in download
return self.get(rpath, lpath, recursive=recursive, **kwargs)
File "/home/alp/.local/lib/python3.10/site-packages/fsspec/asyn.py", line 121, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/home/alp/.local/lib/python3.10/site-packages/fsspec/asyn.py", line 106, in sync
raise return_result
File "/home/alp/.local/lib/python3.10/site-packages/fsspec/asyn.py", line 61, in _runner
result[0] = await coro
File "/home/alp/.local/lib/python3.10/site-packages/fsspec/asyn.py", line 604, in _get
return await _run_coros_in_chunks(
File "/home/alp/.local/lib/python3.10/site-packages/fsspec/asyn.py", line 257, in _run_coros_in_chunks
await asyncio.gather(*chunk, return_exceptions=return_exceptions),
File "/usr/lib/python3.10/asyncio/tasks.py", line 408, in wait_for
return await fut
File "/home/alp/.local/lib/python3.10/site-packages/s3fs/core.py", line 1193, in _get_file
body, content_length = await _open_file(range=0)
File "/home/alp/.local/lib/python3.10/site-packages/s3fs/core.py", line 1184, in _open_file
resp = await self._call_s3(
File "/home/alp/.local/lib/python3.10/site-packages/s3fs/core.py", line 348, in _call_s3
return await _error_wrapper(
File "/home/alp/.local/lib/python3.10/site-packages/s3fs/core.py", line 140, in _error_wrapper
raise err
PermissionError: The difference between the request time and the current time is too large.
The usual problem for this error is that the time on my local machine is out of sync with the current time. However, this is not the case here. I checked the time and even reset it with no success. See resources here:
Describe the bug
When loading a large dataset (>1000GB) from S3 I run into the following error:
The usual problem for this error is that the time on my local machine is out of sync with the current time. However, this is not the case here. I checked the time and even reset it with no success. See resources here:
The error does not appear when loading a smaller dataset (e.g. our test set) from the same s3 path.
Steps to reproduce the bug
Expected behavior
Load dataset without running into this error.
Environment info
datasets
version: 2.13.1The text was updated successfully, but these errors were encountered: