Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Fix bug where module resolution would not be invalidated if an entire package was deleted #12378

Merged
merged 2 commits into from
Jul 19, 2024

Conversation

AlexWaygood
Copy link
Member

Summary

This fixes a bug where module resolution would not be invalidated if an entire package was deleted

Test Plan

I added a test that fails on main, and passes with this PR.

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Jul 18, 2024
Copy link
Contributor

github-actions bot commented Jul 18, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

FilePathRef::System(path) => resolver.db.system().path_exists(&path.join("__init__.pyi")),
FilePathRef::Vendored(path) => resolver.db.vendored().exists(path.join("__init__.pyi")),
FilePathRef::System(path) => system_path_to_file(resolver.db.upcast(),path.join("__init__.pyi")).is_some(),
FilePathRef::Vendored(path) => vendored_path_to_file(resolver.db.upcast(), path.join("__init__.pyi")).is_some(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We can use vendored().exists here because we know that the system is read only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about this. In what situation should we use vendored_path_to_file()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases where we need a File, for example because we have to pass it to source_text

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right -- but if we know it's immutable, maybe we should just be calling vendored.read-to_string() instead of source_text() to get the source text -- the same argument applies, no? (Not trying to be argumentative, just curious!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends. Not if it is a Python file, because calling source_text then has the benefit that we read the vendored python file exactly once instead of multiple times (file.read_to_string is not cached).

That's also not the point I made above. We need vendored_path_to_file when calling parsed_module because the function takes a File argument (because it doesn't care if it is a vendored or non vendored file). That's when you need a file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file.read_to_string is not cached

Nor is vendored.exists() -- each time we call that, we'll be querying the zip archive. Is the logic that we think that this will be inexpensive enough that it's not worth caching?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. This seems to be unrelated to the original question.

it's not worth caching

Probably not. Caching comes at a high cost (memory, locking, lookup). We cache the resolved module, which should prevent from calling into the vendored system in the first place. But we have to wait for some real world usage to tell.

Copy link
Member Author

@AlexWaygood AlexWaygood Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unrelated to the original question.

It seems related to me, but perhaps this illustates confusion on my part :-)

I'm trying to understand when and why (in future PRs) I should go via the Files APIs for reading information about files stored in the vendored zip archive, and when it makes sense to just get this information from the VendoredFileSystem. If I understand correctly, the tradeoffs are as follows:

  1. Going via the Files API has the advantage that the query is automatically invalidated when the file changes -- but the file will never change for vendored files, so, unlike with filesystem files, that's not a good reason to go via the Files API.
  2. Going via the Files API has the advantage that the call is automatically cached. This means that if we go via the Files API, we'll end up querying the zip archive less frequently; but that could also cost more than it saves us, due to higher memory allocation/consumption, and because locking and lookups in the cache are also costly.
  3. Going via the Files API gives you a File object that you can pass directly to queries such as ruff_db::parsed::parsed_module. You wouldn't be able to call these queries if you just read the contents of the file as a string using the VendoredFileSystem directly.

Is that an accurate summary? I feel like my question has been answered now, anyway!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good summary. I now see how it is related.

Regarding vendored_path_to_file(...).exists. This is actually not cached for files that don't exist. We could, but we currently don't to avoid tracking Files for vendored paths that don't exist. So vendored().exists and vendored_path_to_file(path).exists have the same cost for files that don't exist.

crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
@AlexWaygood AlexWaygood force-pushed the resolver-salsa-invalidation branch from 9430f04 to 031effb Compare July 19, 2024 12:43
@AlexWaygood AlexWaygood merged commit 5f96f69 into main Jul 19, 2024
20 checks passed
@AlexWaygood AlexWaygood deleted the resolver-salsa-invalidation branch July 19, 2024 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants