Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning possibly-undefined-variable if a for loop variable that was not defined prior to the loop is accessed out of its intended scope #5146

Closed
DetachHead opened this issue Oct 12, 2021 · 15 comments
Labels
Enhancement ✨ Improvement to a component

Comments

@DetachHead
Copy link
Contributor

Current problem

an annoying feature in python is how variables are still in scope when you wouldn't expect them to be

for value in [1,2,3]:
    print(value)

for other_value in [2,3,4]:
    print(value) # no warning

Desired solution

a rule that warns when a variable is being accessed outside of the "scope" where it was defined, suggesting that you del the variable. in the example above, it could suggest the following fix:

for value in [1,2,3]:
    print(value)
del value

for other_value in [2,3,4]:
    print(value) # error

Additional context

No response

@DetachHead DetachHead added Enhancement ✨ Improvement to a component Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Oct 12, 2021
@Pierre-Sassoulas
Copy link
Member

A check that warn when you're accessing a variable outside of its scope would be useful. But the fix is to not do that, adding a del make the code worse.

@Pierre-Sassoulas Pierre-Sassoulas changed the title warning to delete variables once they should be out of scope Warning if a variable is accessed out of its intended scope Oct 12, 2021
@DetachHead
Copy link
Contributor Author

adding a del make the code worse.

How so? I agree that code like this should be avoided altogether by splitting it into functions or something but is there a reason why using del in my example is worse than not using it?

@DudeNr33
Copy link
Collaborator

DudeNr33 commented Oct 12, 2021

In essence this would mean you always have to put a del behind a for loop.

Consider the following:
If you made the mistake to reference the loop variable again, pylint would complain (as soon as the new check is implemented, at least).

What would be the next step?
Adding del would not really help, the main problem is the fact that you accessed the loop variable. You have to change that (even more so because with the del your program would crash otherwise).

Now that you changed it, the del no longer has any effect beside preventing this mistake from happening again.

But if you are afraid of that, you might also be afraid that accessing loop variables after the loop finished could happen anywhere in your code base.
So, should you del all loop variables everytime and everywhere?
I don't think so.

@Pierre-Sassoulas
Copy link
Member

To add to that, it's just the way python works: adding a del after every for loop would clutter the code (possibly have performance implication too) when the easy fix is to not use variables out of their scope. .

Asking user to do that would be a colossal waste of resources too. How many for loops where made in Python in 30 years? If applied blindly on every Python code ever made this check would suggest adding probably trillions of line of del statement to perfectly fine code. So, if something need to change regarding the possibility to shoot oneself in the foot by accessing for loop variable out of scope like this, it should be in Python design itself.

@KotlinIsland
Copy link
Contributor

So more like

for name in names:
  print(name)
send_message(name)  # sus alert! Reused variable from loop

@Pierre-Sassoulas
Copy link
Member

Here's an article I found interesting which point out that there's also possible complications to using del (exceptions handling), provide historical context supposedly from Guido but without actual source or I would have linked that, and show some useful pattern that can be used because of this, like:

for i in somegenerator():
    if isinteresing(i):
        break
dostuffwith(i)

Or

for i, item in enumerate(somegenerator()):
    dostuffwith(i, item)
print('The loop executed {0} times!'.format(i+1))

@Pierre-Sassoulas Pierre-Sassoulas changed the title Warning if a variable is accessed out of its intended scope Warning reused-loop-variable-out-of-scope if a variable is accessed out of its intended scope Oct 14, 2021
@DanielNoord
Copy link
Collaborator

The examples you provide are why I would say a for loop doesn’t necessarily imply an “intended scope”. I think this warning will flag perfectly valid code and will therefore become quite controversial. As it also doesn’t provide any real performance benefit I think this should be an optional warning, if it should be a warning at all.

@DudeNr33
Copy link
Collaborator

In my first comment I initially had a sentence also saying that this would be perfectly valid code, but deleted it shortly after.
The reason is that your iterable could be empty, and in that case the loop variable is undefined.
IIRC PyCharm also flags this with something like "potentially undefined variable", and a warning from pylint would make sense.
The solution would be to intialize the variable before the loop to make sure that it will be defined even if the iterable is empty.

@DanielNoord
Copy link
Collaborator

I like that solution! Warning because of potential undefinition seems much more logical than for using “semi out of scope” variable names.

@Pierre-Sassoulas
Copy link
Member

Ok, so it seems a reasonable resolution for this issue would be to add an optional extension for the opinionated reused-loop-variable-out-of-scope check and add the possibly-undefined-variable check as a default warning ? (Accepting better suggestion for names 😄!)

With the following functional tests:

for i in somegenerator():
    if isinteresing(i):
        break
dostuffwith(i) # [possibly-undefined-variable]

for i, item in enumerate(somegenerator()):
    dostuffwith(i, item)
print('The loop executed {0} times!'.format(i+1))  # [possibly-undefined-variable]


j = 0
for j, item in enumerate(somegenerator()):
    dostuffwith(j, item)
print('The loop executed {0} times!'.format(j+1))  # [reused-loop-variable-out-of-scope]

for j, item in enumerate(somegenerator()):
    dostuffwith(j, item)
print('The loop executed {0} times!'.format(j+1))  # [reused-loop-variable-out-of-scope]

@DudeNr33
Copy link
Collaborator

I would only vote for the possibly-undefined-variable checker.

The first functional test for reused-loop-variable-out-of-scope is valid in my opinion, as j has been defined before the loop - which can (imo) only mean that it was intended to be used again afterwards.

The second functional test for reused-loop-variable-out-of-scope looks the same as possibly-undefined-variable for me. What would be the difference here?

@Pierre-Sassoulas
Copy link
Member

I like your argument.

j is also defined for the second reused-loop-variable-out-of-scope (only it's before the first loop). Maybe it's the only valid case of # [reused-loop-variable-out-of-scope], because reusing it after another for loop seems weird ?

@Zeckie
Copy link
Contributor

Zeckie commented Oct 15, 2021

If you want to be explicit that you want the value available outside the loop, you could use something like:

k = 0
for j, item in enumerate(somegenerator(), start=1):
    k = j
    dostuffwith(j, item)
print(f'The first loop executed {k} times!')


for j, item in enumerate(somegenerator(), start=1):
    k = j
    dostuffwith(j, item)
print(f'One of the loops executed {k} times!')

But that doesn't solve the issue around whether the second loop expected k to be initialized again. However, the same could happen without for loops (especially in long functions).

I agree with DudeNr33 (and pycharm) that the warning should only occur when the variable has not been assigned before the loop, so could be undefined after.

@Pierre-Sassoulas Pierre-Sassoulas changed the title Warning reused-loop-variable-out-of-scope if a variable is accessed out of its intended scope Warning possibly-undefined-variable if a for loop variable that was not defined prior to the loop is accessed out of its intended scope Oct 15, 2021
@Pierre-Sassoulas Pierre-Sassoulas removed the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Oct 15, 2021
@DudeNr33
Copy link
Collaborator

j is also defined for the second reused-loop-variable-out-of-scope (only it's before the first loop). Maybe it's the only valid case of # [reused-loop-variable-out-of-scope], because reusing it after another for loop seems weird ?

Ah, I thought those functional tests were "standalone" examples.
I think even this case could cause a lot of false positives. Take the following example:

def find_suitable_candidate():
    j = None
    for j in list_of_most_probable_candidates:
        if is_suitable(j):
            break
    if j:
        return j
    for j in list_of_less_probable_candidates:
        if is_suitable(j) and some_other_criterion(j):
            break
    return j

While this code is far from perfect and could be simplified, this is a case where accessing j after the second loop is as valid as the initial examples.
Even for an optional checker I fear that it would be far too sensitive and error prone.

As for the possibly-undefined-variable check:
I just fired up PyCharm again as I haven't used in a few months. PyCharm is clever enough to not complain if it can easily tell that the iterable is not empty, as in:

for i in [1, 2, 3]:
    print(i)
print("The last number was:", i). # won't complain

As soon as the iterable is not directly defined in the for statement (e.g. returned from a function instead), it will issue the warning.
I think pylint should be able to do the same without too much effort, and we should add this example to the functional tests.

@Pierre-Sassoulas
Copy link
Member

Closing as pylint complain if the list is empty or can be empty.

for value in [1, 2, 3]:
    print(value)

for other_value in [2, 3, 4]:
    print(value)  # no warning

for value_2 in []:
    print(value)

for other_value in [2, 3, 4]:
    print(value_2)  # [undefined-loop-variable]


def print_names(names):
    for name in names:
        print(name)
    send_message(name)  # [undefined-loop-variable]


def send_message(i):
    print(i)

Result:

************* Module a
a.py:11:10: W0631: Using possibly undefined loop variable 'value_2' (undefined-loop-variable)
a.py:17:17: W0631: Using possibly undefined loop variable 'name' (undefined-loop-variable)

------------------------------------------------------------------
Your code has been rated at 8.57/10 (previous run: 9.29/10, -0.71)

I think this is very close to the optimal in term of false positive/false negative. Thanks to @jacobtylerwalls for making this check very good in countless MR !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component
Projects
None yet
Development

No branches or pull requests

6 participants