Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix remote file system (get/put) #1955

Merged
merged 6 commits into from
Nov 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions flytekit/core/data_persistence.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,10 @@
self.strip_file_header(from_path), self.strip_file_header(to_path), dirs_exist_ok=True
)
print(f"Getting {from_path} to {to_path}")
return file_system.get(from_path, to_path, recursive=recursive, **kwargs)
dst = file_system.get(from_path, to_path, recursive=recursive, **kwargs)

Check warning on line 251 in flytekit/core/data_persistence.py

View check run for this annotation

Codecov / codecov/patch

flytekit/core/data_persistence.py#L251

Added line #L251 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does dst stand for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a test like

class JsonTypeTransformer(TypeTransformer[T]):
, just making a dummy transformer, if that's easy to do. if not, just add the below as a comment in both places.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dst is short for destination, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, since fsspec uses dst as well

if isinstance(dst, (str, pathlib.Path)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, [None] is a guard value returned by fsspec, right? If that's the case, why don't we handle those explicitly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh really? didn't see that. link me?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yes we should, but the action should still be to return the to_path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's part of the spec and I fear this is specific to s3fs. Doing what we're doing in the PR is probably safer (as it might apply to other fsspec-compliant implementations).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return dst
return to_path

Check warning on line 254 in flytekit/core/data_persistence.py

View check run for this annotation

Codecov / codecov/patch

flytekit/core/data_persistence.py#L253-L254

Added lines #L253 - L254 were not covered by tests
except OSError as oe:
logger.debug(f"Error in getting {from_path} to {to_path} rec {recursive} {oe}")
file_system = self.get_filesystem(get_protocol(from_path), anonymous=True)
Expand All @@ -271,7 +274,11 @@
self.strip_file_header(from_path), self.strip_file_header(to_path), dirs_exist_ok=True
)
from_path, to_path = self.recursive_paths(from_path, to_path)
return file_system.put(from_path, to_path, recursive=recursive, **kwargs)
dst = file_system.put(from_path, to_path, recursive=recursive, **kwargs)

Check warning on line 277 in flytekit/core/data_persistence.py

View check run for this annotation

Codecov / codecov/patch

flytekit/core/data_persistence.py#L277

Added line #L277 was not covered by tests
if isinstance(dst, (str, pathlib.Path)):
return dst

Check warning on line 279 in flytekit/core/data_persistence.py

View check run for this annotation

Codecov / codecov/patch

flytekit/core/data_persistence.py#L279

Added line #L279 was not covered by tests
else:
return to_path

Check warning on line 281 in flytekit/core/data_persistence.py

View check run for this annotation

Codecov / codecov/patch

flytekit/core/data_persistence.py#L281

Added line #L281 was not covered by tests

def put_raw_data(
self,
Expand Down
37 changes: 37 additions & 0 deletions tests/flytekit/unit/core/test_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import fsspec
import mock
import pytest
from s3fs import S3FileSystem

from flytekit.configuration import Config, DataConfig, S3Config
from flytekit.core.context_manager import FlyteContextManager
Expand Down Expand Up @@ -126,6 +127,42 @@ def test_local_provider(source_folder):
assert len(files) == 2


def test_async_file_system():
remote_path = "test:///tmp/test.py"
local_path = "test.py"

class MockAsyncFileSystem(S3FileSystem):
def __init__(self, *args, **kwargs):
super().__init__(args, kwargs)

async def _put_file(self, *args, **kwargs):
# s3fs._put_file returns None as well
return None

async def _get_file(self, *args, **kwargs):
# s3fs._get_file returns None as well
return None

async def _lsdir(
self,
path,
refresh=False,
max_items=None,
delimiter="/",
prefix="",
versions=False,
):
return False

fsspec.register_implementation("test", MockAsyncFileSystem)

ctx = FlyteContextManager.current_context()
dst = ctx.file_access.put(local_path, remote_path)
assert dst == remote_path
dst = ctx.file_access.get(remote_path, local_path)
assert dst == local_path


@pytest.mark.sandbox_test
def test_s3_provider(source_folder):
# Running mkdir on s3 filesystem doesn't do anything so leaving out for now
Expand Down
Loading