-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix remote file system (get/put) #1955
Conversation
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1955 +/- ##
==========================================
- Coverage 62.70% 62.58% -0.13%
==========================================
Files 313 310 -3
Lines 23181 23108 -73
Branches 3511 3513 +2
==========================================
- Hits 14536 14462 -74
- Misses 8223 8224 +1
Partials 422 422 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a unit test we can add?
@@ -248,7 +248,10 @@ def get(self, from_path: str, to_path: str, recursive: bool = False, **kwargs): | |||
self.strip_file_header(from_path), self.strip_file_header(to_path), dirs_exist_ok=True | |||
) | |||
print(f"Getting {from_path} to {to_path}") | |||
return file_system.get(from_path, to_path, recursive=recursive, **kwargs) | |||
dst = file_system.get(from_path, to_path, recursive=recursive, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does dst stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a test like
class JsonTypeTransformer(TypeTransformer[T]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dst
is short for destination, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, since fsspec uses dst as well
@eapolinario basically with the pr that went in we kinda added a new API to the api without being able to make it explicit. Basically, when an fsspec filesystem now handles get/put, it has the option of returning a value, and that value is now also used by flytekit as the uri in most places. This was useful because the flyte fs is not a real fs, so users don't actually pick the to_path that they want to write to. instead where it was actually written is returned by the filesystem. In this case however, the s3 filesystem was returning |
@@ -248,7 +248,10 @@ def get(self, from_path: str, to_path: str, recursive: bool = False, **kwargs): | |||
self.strip_file_header(from_path), self.strip_file_header(to_path), dirs_exist_ok=True | |||
) | |||
print(f"Getting {from_path} to {to_path}") | |||
return file_system.get(from_path, to_path, recursive=recursive, **kwargs) | |||
dst = file_system.get(from_path, to_path, recursive=recursive, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dst
is short for destination, right?
@@ -248,7 +248,10 @@ def get(self, from_path: str, to_path: str, recursive: bool = False, **kwargs): | |||
self.strip_file_header(from_path), self.strip_file_header(to_path), dirs_exist_ok=True | |||
) | |||
print(f"Getting {from_path} to {to_path}") | |||
return file_system.get(from_path, to_path, recursive=recursive, **kwargs) | |||
dst = file_system.get(from_path, to_path, recursive=recursive, **kwargs) | |||
if isinstance(dst, (str, pathlib.Path)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, [None]
is a guard value returned by fsspec
, right? If that's the case, why don't we handle those explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh really? didn't see that. link me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes we should, but the action should still be to return the to_path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's part of the spec and I fear this is specific to s3fs
. Doing what we're doing in the PR is probably safer (as it might apply to other fsspec-compliant implementations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsyncFileSystem.put or AsyncFileSystem.get always return a list.
Signed-off-by: Kevin Su <[email protected]>
* test Signed-off-by: Kevin Su <[email protected]> * test Signed-off-by: Kevin Su <[email protected]> * test Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * update-get Signed-off-by: Kevin Su <[email protected]> * add unit test Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]>
* test Signed-off-by: Kevin Su <[email protected]> * test Signed-off-by: Kevin Su <[email protected]> * test Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * update-get Signed-off-by: Kevin Su <[email protected]> * add unit test Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]>
* test Signed-off-by: Kevin Su <[email protected]> * test Signed-off-by: Kevin Su <[email protected]> * test Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * update-get Signed-off-by: Kevin Su <[email protected]> * add unit test Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]> Signed-off-by: Rafael Raposo <[email protected]>
TL;DR
Failed to serialize the flytefile due to below error:
That's because asyncFileSystem (s3fs) return [None] when calling
file_system.put
hereType
Are all requirements met?
Complete description
Tracking Issue
NA
Follow-up issue
NA