-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document that parallelizing pylint by giving it only a few files at once will create incorrect type inference leading to a worse experience #9341
Comments
Thank you for opening the issue. I'm hesitating to label this a false positive, because it seems to me that pylint being able to detect that sometime |
Ack that the behavior may be useful in some contexts, but I would definitely say this is a false positive. Additionally, this causes "spooky action at a distance" because it is not deterministic. If This comes up all the time in large projects which run pylint on subsets of the project in parallel (for example, using pre-commit). An almost daily interaction I am having right now at work is with someone less knowledgeable about pylint/pre-commit/etc asking why seemingly unrelated changes cause this (or similar) error to appear and fail their build. |
Thanks for the report. I'm hearing two requests. One is to distinguish: def function2():
df = SomeType()
return {k: k.lower() for k in df.columns} from this, which is an error: def function2():
df = function1() # assume function1 returns `df`
return {k: k.lower() for k in df.columns} That's not really on pylint's roadmap, which I know might be frustrating, but pylint/astroid's inference capabilities advertise 'all the values your variables might take' -- in other words, we've inherited a system which collects all the possible values for The other request I'm hearing is to make this more deterministic. That's totally reasonable, but I'm having trouble reproducing.
|
The issue is that when linted in isolation these files produce no diagnostic, but when linted together they do.
|
But my example shows that I tried that, no? I ran |
Are you defining the class |
The contents of
Your example shows only that you ran The problem I'm trying to convey is that you cannot reproduce the diagnostic when linting either file separately. That is what makes this inconsistent and causes us problems in our environment. If you have a large project that uses pre-commit you will get spooky action at a distance where simply adding or removing a completely unrelated file will change diagnostic output because different files are passed to a single invocation of pylint. I guess the root question is whether or not you consider the diagnostic emitted by the following code to be a false positive or not: class SomeType:
def __init__(self):
self.columns: list[str] = []
def function1():
"""
some function declared anywhere else in the codebase.
in another file and never called in my case.
"""
df = SomeType()
df.columns = [1, 2, 3]
def function2():
df = SomeType()
# pylint incorrectly infers the type of `k` to be int
return {k: k.lower() for k in df.columns} I do, because the code in I understand fixing this may not be on the roadmap, or may not be feasible given the design of astroid, but can we all agree that it is a false positive? |
Got it, I think I ran one of the
It would be a false positive for a linter that's control-flow aware. What if Given that, I think you raise a really good usability point about pre-commit. I don't think we've documented anywhere that if you just run pre-commit on your git diff, you're going to get fewer pylint messages for those specific files than you'd get if you linted your whole project. Linting the whole project is impractical for pre-commit, so maybe we need some sort of "confidence" level that can allow users to silence diagnostics like the one you've illustrated here. I'm open to that, and would be eager to see someone explore the feasibility. If you don't mind, I'd like to retitle the issue to focus this for a contributor who might want to take this up. Thanks again for the issue. 👍 |
FWIW, I wasn't even thinking about pre-commit for the git diff subset, but as a CI step run against all files. pre-commit takes the full list of files and "deterministically" shuffles them before chunking them into command line length limit subsets and running pylint on each subset. If you add or remove any file from the list, the deterministic shuffling changes, and the group of files passed to pylint changes. That is what causes the most frustration for us. |
I feel like not being able to specify that you don't want to parallelize is an issue with pre-commit and not pylint. That being said, I think more and more that pylint should not be used in pre-commit:
I'm considering adding a disclaimer in the doc to "use pylint in a continous integration job or periodically and do not use pylint during pre-commit or inside your IDE on save, pylint is not meant to do that and will only frustrate you". And adding caveat to the pre-commit doc for those that still want to use pylint in pre-commit. |
Unless we're ready to say that running pylint with a list of files is a second-class experience--which it sort of is, but should we bite that bullet?--I had in mind something relatively simple. When I looked at the root cause yesterday, it was that the information from both files is collated together in |
I'm a little afraid about the maintenance implication of that pre-commit confidence. (I don't think I understand what it entices and fear that we'll have to think about what happens in pre-commit for all messages, which is scary).
But pylint analyse all the calls in the code, if you don't analyses the files with a function call in it, you won't have all the possible values that can enter a function. It's kinda expected. I don't think it's a wrong assumption to make or keep. Only we need to warn user that parallelizing pylint deterministically is not easy and they shouldn't try to do it themselves. Also, I searched a little and it seems that |
That's fair. Probably not worth putting on the roadmap. We should probably document that linting project files is a poorer experience than linting a project. |
…tiple processes Closes pylint-dev#9341
…tiple processes Refs microsoft/vscode-pylint#454 Closes pylint-dev#9341
…tiple processes Refs microsoft/vscode-pylint#454 Closes pylint-dev#9341
…tiple processes Refs microsoft/vscode-pylint#454 Closes pylint-dev#9341
…tiple processes Refs microsoft/vscode-pylint#454 Also add known caveats for custom parallization. Closes pylint-dev#9341 Co-authored-by: Jacob Walls <[email protected]>
…tiple processes Refs microsoft/vscode-pylint#454 Also add known caveats for custom parallization. Closes #9341 Co-authored-by: Jacob Walls <[email protected]>
Bug description
EDIT: a simpler repro without pandas
Configuration
No response
Command used
Pylint output
Expected behavior
No diagnostic
Pylint version
OS / Environment
No response
Additional dependencies
The text was updated successfully, but these errors were encountered: