-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Fix bug where module resolution would not be invalidated if an entire package was deleted #12378
Conversation
|
FilePathRef::System(path) => resolver.db.system().path_exists(&path.join("__init__.pyi")), | ||
FilePathRef::Vendored(path) => resolver.db.vendored().exists(path.join("__init__.pyi")), | ||
FilePathRef::System(path) => system_path_to_file(resolver.db.upcast(),path.join("__init__.pyi")).is_some(), | ||
FilePathRef::Vendored(path) => vendored_path_to_file(resolver.db.upcast(), path.join("__init__.pyi")).is_some(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We can use vendored().exists
here because we know that the system is read only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this. In what situation should we use vendored_path_to_file()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases where we need a File
, for example because we have to pass it to source_text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right -- but if we know it's immutable, maybe we should just be calling vendored.read-to_string()
instead of source_text()
to get the source text -- the same argument applies, no? (Not trying to be argumentative, just curious!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. Not if it is a Python file, because calling source_text
then has the benefit that we read the vendored python file exactly once instead of multiple times (file.read_to_string
is not cached).
That's also not the point I made above. We need vendored_path_to_file
when calling parsed_module
because the function takes a File
argument (because it doesn't care if it is a vendored or non vendored file). That's when you need a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file.read_to_string
is not cached
Nor is vendored.exists()
-- each time we call that, we'll be querying the zip archive. Is the logic that we think that this will be inexpensive enough that it's not worth caching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. This seems to be unrelated to the original question.
it's not worth caching
Probably not. Caching comes at a high cost (memory, locking, lookup). We cache the resolved module, which should prevent from calling into the vendored system in the first place. But we have to wait for some real world usage to tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be unrelated to the original question.
It seems related to me, but perhaps this illustates confusion on my part :-)
I'm trying to understand when and why (in future PRs) I should go via the Files
APIs for reading information about files stored in the vendored zip archive, and when it makes sense to just get this information from the VendoredFileSystem
. If I understand correctly, the tradeoffs are as follows:
- Going via the
Files
API has the advantage that the query is automatically invalidated when the file changes -- but the file will never change for vendored files, so, unlike with filesystem files, that's not a good reason to go via theFiles
API. - Going via the
Files
API has the advantage that the call is automatically cached. This means that if we go via theFiles
API, we'll end up querying the zip archive less frequently; but that could also cost more than it saves us, due to higher memory allocation/consumption, and because locking and lookups in the cache are also costly. - Going via the
Files
API gives you aFile
object that you can pass directly to queries such asruff_db::parsed::parsed_module
. You wouldn't be able to call these queries if you just read the contents of the file as a string using theVendoredFileSystem
directly.
Is that an accurate summary? I feel like my question has been answered now, anyway!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good summary. I now see how it is related.
Regarding vendored_path_to_file(...).exists
. This is actually not cached for files that don't exist. We could, but we currently don't to avoid tracking File
s for vendored paths that don't exist. So vendored().exists
and vendored_path_to_file(path).exists
have the same cost for files that don't exist.
9430f04
to
031effb
Compare
Summary
This fixes a bug where module resolution would not be invalidated if an entire package was deleted
Test Plan
I added a test that fails on
main
, and passes with this PR.