Document that parallelizing pylint by giving it only a few files at once will create incorrect type inference leading to a worse experience #9341

apmorton · 2024-01-02T21:56:15Z

Bug description

import pandas as pd


def function1():
    """
    some function declared anywhere else in the codebase.
    in another file and never called in my case.
    """
    df = pd.DataFrame()
    df.columns = [1, 2, 3]


def function2():
    """
    some function which sometimes operates on
    a default constructed DataFrame object
    """
    df = pd.DataFrame()
    # pylint incorrectly infers the type of `k` to be int
    df.rename(columns={k: k.lower() for k in df.columns})

EDIT: a simpler repro without pandas

class SomeType:
    def __init__(self):
        self.columns: list[str] = []

def function1():
    """
    some function declared anywhere else in the codebase.
    in another file and never called in my case.
    """
    df = SomeType()
    df.columns = [1, 2, 3]


def function2():
    df = SomeType()
    # pylint incorrectly infers the type of `k` to be int
    return {k: k.lower() for k in df.columns}

Configuration

No response

Command used

pylint repro.py

Pylint output

************* Module repro
repro.py:20:26: E1101: Instance of 'int' has no 'lower' member (no-member)

Expected behavior

No diagnostic

Pylint version

pylint 3.0.3
astroid 3.0.2
Python 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:03:24) [GCC 12.3.0]

OS / Environment

No response

Additional dependencies

pandas 2.1.4 or 1.5.3

Pierre-Sassoulas · 2024-01-04T14:19:18Z

Thank you for opening the issue. I'm hesitating to label this a false positive, because it seems to me that pylint being able to detect that sometime columns can be list[int] (and looking everywhere it can to find that info) bring a lot of value (especially in half typed / untyped code base).

apmorton · 2024-01-04T19:57:38Z

I'm hesitating to label this a false positive

Ack that the behavior may be useful in some contexts, but I would definitely say this is a false positive.
It is not possible for int to be observed in function2, and so the warning pylint emits is not helpful.

Additionally, this causes "spooky action at a distance" because it is not deterministic.

If function1 and function2 are in different files you will get different diagnostics when running:
pylint file1.py; pylint file2.py and pylint file1.py file2.py.

This comes up all the time in large projects which run pylint on subsets of the project in parallel (for example, using pre-commit).

An almost daily interaction I am having right now at work is with someone less knowledgeable about pylint/pre-commit/etc asking why seemingly unrelated changes cause this (or similar) error to appear and fail their build.

jacobtylerwalls · 2024-02-21T15:14:14Z

Thanks for the report. I'm hearing two requests. One is to distinguish:

def function2():
    df = SomeType()
    return {k: k.lower() for k in df.columns}

from this, which is an error:

def function2():
    df = function1()  # assume function1 returns `df`
    return {k: k.lower() for k in df.columns}

That's not really on pylint's roadmap, which I know might be frustrating, but pylint/astroid's inference capabilities advertise 'all the values your variables might take' -- in other words, we've inherited a system which collects all the possible values for SomeType.columns and doesn't know what functions you've called on them by the time we lint one of the instances you have in hand.

The other request I'm hearing is to make this more deterministic. That's totally reasonable, but I'm having trouble reproducing.

% cat a.py
class SomeType:
    def __init__(self):
        self.columns: list[str] = []

def function2():
    df = SomeType()
    return {k: k.lower() for k in df.columns}

% cat b.py
from .a import SomeType

def function1():
    de = SomeType()
    de.columns = [1, 2, 3]

% pylint a.py b.py
************* Module pylint.a
a.py:7:15: E1101: Instance of 'int' has no 'lower' member (no-member)

------------------------------------------------------------------
Your code has been rated at 5.00/10 (previous run: 5.00/10, +0.00)

apmorton · 2024-02-21T15:28:50Z

$ pylint a.py
$ pylint b.py
$ pylint a.py b.py
************* Module a
a.py:7:15: E1101: Instance of 'int' has no 'lower' member (no-member)

The issue is that when linted in isolation these files produce no diagnostic, but when linted together they do.

pylint 3.0.3
astroid 3.0.3
Python 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]

jacobtylerwalls · 2024-02-21T15:35:56Z

But my example shows that I tried that, no? I ran pylint a.py b.py and got a no-member.

jacobtylerwalls · 2024-02-21T15:37:26Z

Are you defining the class SomeType anew in each file? That could explain the difference.

apmorton · 2024-02-21T15:52:29Z

Are you defining the class SomeType anew in each file? That could explain the difference.

The contents of a.py and b.py in my above post are identical to what you showed.

But my example shows that I tried that, no? I ran pylint a.py b.py and got a no-member.

Your example shows only that you ran pylint a.py b.py and got a diagnostic.

The problem I'm trying to convey is that you cannot reproduce the diagnostic when linting either file separately.

That is what makes this inconsistent and causes us problems in our environment.

If you have a large project that uses pre-commit you will get spooky action at a distance where simply adding or removing a completely unrelated file will change diagnostic output because different files are passed to a single invocation of pylint.

I guess the root question is whether or not you consider the diagnostic emitted by the following code to be a false positive or not:

class SomeType:
    def __init__(self):
        self.columns: list[str] = []

def function1():
    """
    some function declared anywhere else in the codebase.
    in another file and never called in my case.
    """
    df = SomeType()
    df.columns = [1, 2, 3]


def function2():
    df = SomeType()
    # pylint incorrectly infers the type of `k` to be int
    return {k: k.lower() for k in df.columns}

I do, because the code in function1 is not modifying anything about the SomeType type, it is changing an instance property - it can never be observed in function2.

I understand fixing this may not be on the roadmap, or may not be feasible given the design of astroid, but can we all agree that it is a false positive?

jacobtylerwalls · 2024-02-21T16:24:52Z

Got it, I think I ran one of the pylint a.py runs with the wrong contents and got mixed up. Thanks for bearing with 🙏

I understand fixing this may not be on the roadmap, or may not be feasible given the design of astroid, but can we all agree that it is a false positive?

It would be a false positive for a linter that's control-flow aware. What if SomeType.__init__() calls function1() (edited to return something) and assigns the result to .columns? Or does that conditionally? Then it's no longer a false positive and pylint got it right. I know I'm changing your hypothetical, so whether we call it a false positive or not, I'm just trying to illuminate the attitude behind the original design we inherited. I agree with you that in the example you supplied here, the message is not helpful.

Given that, I think you raise a really good usability point about pre-commit. I don't think we've documented anywhere that if you just run pre-commit on your git diff, you're going to get fewer pylint messages for those specific files than you'd get if you linted your whole project. Linting the whole project is impractical for pre-commit, so maybe we need some sort of "confidence" level that can allow users to silence diagnostics like the one you've illustrated here.

I'm open to that, and would be eager to see someone explore the feasibility. If you don't mind, I'd like to retitle the issue to focus this for a contributor who might want to take this up. Thanks again for the issue. 👍

apmorton · 2024-02-21T16:51:07Z

FWIW, I wasn't even thinking about pre-commit for the git diff subset, but as a CI step run against all files.

pre-commit takes the full list of files and "deterministically" shuffles them before chunking them into command line length limit subsets and running pylint on each subset.

If you add or remove any file from the list, the deterministic shuffling changes, and the group of files passed to pylint changes.

That is what causes the most frustration for us.

Pierre-Sassoulas · 2024-02-22T09:36:23Z

I feel like not being able to specify that you don't want to parallelize is an issue with pre-commit and not pylint.

That being said, I think more and more that pylint should not be used in pre-commit:

pylint require an installed environment so it can't be used with pre-commit.ci (https://github.com/pylint-dev/pylint/blob/main/.pre-commit-config.yaml#L2), so you need a CI job for it
pylint use multiple files analysis so pre-commit parallelization make it worse and less deterministic
above all, pylint is slow and will always be (duplicate-code is intrinsically a costly thing to check) and committing should be fast

I'm considering adding a disclaimer in the doc to "use pylint in a continous integration job or periodically and do not use pylint during pre-commit or inside your IDE on save, pylint is not meant to do that and will only frustrate you". And adding caveat to the pre-commit doc for those that still want to use pylint in pre-commit.

jacobtylerwalls · 2024-02-22T14:22:44Z

pylint use multiple files analysis so pre-commit parallelization make it worse and less deterministic

Unless we're ready to say that running pylint with a list of files is a second-class experience--which it sort of is, but should we bite that bullet?--I had in mind something relatively simple.

When I looked at the root cause yesterday, it was that the information from both files is collated together in ClassDef.instance_attrs. When building the ast's, we could collect slightly more information (what file created that instance attr?) and then in pylint, when going through all the filters to prevent false positives for no-member, we could, assuming the user is running with higher than pre-commit confidence, or with a flag --file-subset-mode or whatever, decide to be quiet when the heterogenous instance_attrs are coming from two different files. Something like that.

Pierre-Sassoulas · 2024-02-22T23:33:50Z

I'm a little afraid about the maintenance implication of that pre-commit confidence. (I don't think I understand what it entices and fear that we'll have to think about what happens in pre-commit for all messages, which is scary).

Unless we're ready to say that running pylint with a list of files is a second-class experience--which it sort of is, but should we bite that bullet?--I had in mind something relatively simple.

But pylint analyse all the calls in the code, if you don't analyses the files with a function call in it, you won't have all the possible values that can enter a function. It's kinda expected. I don't think it's a wrong assumption to make or keep. Only we need to warn user that parallelizing pylint deterministically is not easy and they shouldn't try to do it themselves. Also, I searched a little and it seems that pre-commit has a require_serial option that could be be set to true.

jacobtylerwalls · 2024-02-23T00:00:47Z

That's fair. Probably not worth putting on the roadmap.

We should probably document that linting project files is a poorer experience than linting a project.

…tiple processes Closes pylint-dev#9341

…tiple processes Refs microsoft/vscode-pylint#454 Closes pylint-dev#9341

…tiple processes Refs microsoft/vscode-pylint#454 Also add known caveats for custom parallization. Closes pylint-dev#9341 Co-authored-by: Jacob Walls <[email protected]>

…tiple processes Refs microsoft/vscode-pylint#454 Also add known caveats for custom parallization. Closes #9341 Co-authored-by: Jacob Walls <[email protected]>

apmorton added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Jan 2, 2024

Pierre-Sassoulas added inference Needs decision 🔒 Needs a decision before implemention or rejection and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Jan 4, 2024

jacobtylerwalls added Needs reproduction 🔍 Need a way to reproduce it locally on a maintainer's machine and removed Needs decision 🔒 Needs a decision before implemention or rejection labels Feb 21, 2024

jacobtylerwalls changed the title ~~Incorrect type inference leading to false positive~~ Create a pre-commit confidence level Feb 21, 2024

Pierre-Sassoulas added Needs PR This issue is accepted, sufficiently specified and now needs an implementation and removed Needs design proposal 🔒 This is a huge feature, some discussion should happen before a PR is proposed labels Feb 23, 2024

Pierre-Sassoulas changed the title ~~Create a pre-commit confidence level~~ Document that parallelizing pylint by giving it only a few file at once will create incorrect type inference leading to false positive and a worse experience Feb 23, 2024

Pierre-Sassoulas self-assigned this Feb 23, 2024

Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue Feb 25, 2024

[doc pre-commit integration] Details about known caveats of using mul…

cba7038

…tiple processes Closes pylint-dev#9341

Pierre-Sassoulas mentioned this issue Feb 25, 2024

Details about known caveats of using multiple processes in pre-commit #9463

Merged

Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue Feb 25, 2024

[doc pre-commit integration] Details about known caveats of using mul…

12d5b4a

…tiple processes Refs microsoft/vscode-pylint#454 Closes pylint-dev#9341

Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 18, 2024

[doc pre-commit integration] Details about known caveats of using mul…

c8e70c9

…tiple processes Refs microsoft/vscode-pylint#454 Closes pylint-dev#9341

Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 19, 2024

[doc pre-commit integration] Details about known caveats of using mul…

dedc46a

…tiple processes Refs microsoft/vscode-pylint#454 Closes pylint-dev#9341

Pierre-Sassoulas closed this as completed in #9463 May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document that parallelizing pylint by giving it only a few files at once will create incorrect type inference leading to a worse experience #9341

Document that parallelizing pylint by giving it only a few files at once will create incorrect type inference leading to a worse experience #9341

apmorton commented Jan 2, 2024 •

edited

Loading

Pierre-Sassoulas commented Jan 4, 2024 •

edited

Loading

apmorton commented Jan 4, 2024

jacobtylerwalls commented Feb 21, 2024 •

edited

Loading

apmorton commented Feb 21, 2024

jacobtylerwalls commented Feb 21, 2024

jacobtylerwalls commented Feb 21, 2024

apmorton commented Feb 21, 2024

jacobtylerwalls commented Feb 21, 2024 •

edited

Loading

apmorton commented Feb 21, 2024

Pierre-Sassoulas commented Feb 22, 2024 •

edited

Loading

jacobtylerwalls commented Feb 22, 2024

Pierre-Sassoulas commented Feb 22, 2024

jacobtylerwalls commented Feb 23, 2024

Document that parallelizing pylint by giving it only a few files at once will create incorrect type inference leading to a worse experience #9341

Document that parallelizing pylint by giving it only a few files at once will create incorrect type inference leading to a worse experience #9341

Comments

apmorton commented Jan 2, 2024 • edited Loading

Bug description

Configuration

Command used

Pylint output

Expected behavior

Pylint version

OS / Environment

Additional dependencies

Pierre-Sassoulas commented Jan 4, 2024 • edited Loading

apmorton commented Jan 4, 2024

jacobtylerwalls commented Feb 21, 2024 • edited Loading

apmorton commented Feb 21, 2024

jacobtylerwalls commented Feb 21, 2024

jacobtylerwalls commented Feb 21, 2024

apmorton commented Feb 21, 2024

jacobtylerwalls commented Feb 21, 2024 • edited Loading

apmorton commented Feb 21, 2024

Pierre-Sassoulas commented Feb 22, 2024 • edited Loading

jacobtylerwalls commented Feb 22, 2024

Pierre-Sassoulas commented Feb 22, 2024

jacobtylerwalls commented Feb 23, 2024

apmorton commented Jan 2, 2024 •

edited

Loading

Pierre-Sassoulas commented Jan 4, 2024 •

edited

Loading

jacobtylerwalls commented Feb 21, 2024 •

edited

Loading

jacobtylerwalls commented Feb 21, 2024 •

edited

Loading

Pierre-Sassoulas commented Feb 22, 2024 •

edited

Loading