Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workaround for a Windows 'already removed' rmtree problem #4911

Conversation

jurko-gospodnetic
Copy link

@jurko-gospodnetic jurko-gospodnetic commented Dec 7, 2017

This is a fix for issue #4910.

On Windows, we can get an Access Denied error as a result of our rmtree() operation, but with the original operation actually removing the target folder. This can cause problems with the repeated operation failing due to the target folder (or some part of it) not existing.

Workaround is to treat any 'does not exist` errors as success.

(edit by @pradyunsg: Fixes #4910)

@jurko-gospodnetic jurko-gospodnetic force-pushed the windows-rmtree-workaround branch 2 times, most recently from 3744fae to 463cf1a Compare December 7, 2017 11:02
@pradyunsg pradyunsg requested a review from pfmoore December 7, 2017 17:28
@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Dec 7, 2017
@jurko-gospodnetic
Copy link
Author

Failed test seems to be related to something getting stuck during pypy3 tests.

@jurko-gospodnetic jurko-gospodnetic force-pushed the windows-rmtree-workaround branch from 463cf1a to d6f454a Compare December 9, 2017 17:43
@xavfernandez
Copy link
Member

I've restarted the failing build.
I think a docstring of rmtree should be added to reflect the fact that it won't error on non-existing file.
Or maybe:

 def rmtree(dir, ignore_errors=False):
    if not os.path.exists(dir):
        raise FileNotFoundError
    _rmtree(dir, ignore_errors)

 # Retry every half second for up to 3 seconds
 @retry(stop_max_delay=3000, wait_fixed=500)
 def _rmtree(dir, ignore_errors=False):
     shutil.rmtree(dir, ignore_errors=ignore_errors,
                   onerror=rmtree_errorhandler)

@jurko-gospodnetic
Copy link
Author

I'd rather update the docs, as just adding that check there is faking a solution to the problem.

As I mentioned earlier, there is no real way for the caller to really find out from our return value or exception that the folder has not existed or does not exist. Someone could remove that folder just as our code passes that check or could delete it before the check and create it back afterwards. It seems a cleaner interface for the caller to make the check, if they are really interested whether that folder exists at the time they probe for it than to give them an answer they can not really trust.

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but @pfmoore would be the person I want looking at this.

news/4910.bugfix Outdated
failures on repeated operations due to the target folder not existing. Now such
use-cases are treated as successful folder removals.

One externally facing change is that the ``rmtree()`` operation now no longer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't matter since pip doesn't actually have an API as such.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradyunsg - Would it be better to just extract that last paragraph into a separate .misc changelog entry then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can just be dropped entirely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, dropped that last sentence.

@pradyunsg pradyunsg added the OS: windows Windows specific label Dec 16, 2017
func(path)
return
else:
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems over-complex. Why re=-raise the error that caused the failure if the file is not read-only? What's the point of that? Also, I don't see why we're worrying about catching and ignoring a "File not found" error here - surely if the file wasn't there, we'd have hit the initial "if already removed" test? Nothing's going to remove the file between that test and the stat or chmod calls... (OK, there's a remote chance of a race condition, but this code is never going to be 100% robust, so let's not sacrifice maintainability by adding confusing checks that have little benefit...)

tl; dr; can we either simplify this code, or include a more complete explanation of what we're trying to protect against here, please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, actually, is this even valid? Since, it's not in an except block, this'll cause a RuntimeError.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't notice this before because I was on my phone)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wasn't sure either :-(

Probably means we need better test coverage for this change as well. Which will be pretty hard as there's no real explanation here as to how this could ever happen in practice. How do we end up in a situation that we try to remove a file, and find it's disappeared while we're doing so? Sure, a race condition where someone's deleting data in the directory we're trying to delete would do it, but I don't think that's something we should be trying to protect against.

So I guess the question here is, what's the use case for this? I've never seen the behaviour the OP reports, and I don't know of anyone else who has - it sounds to me more like an OS issue than normal behaviour that pip should be catering for.

Copy link
Author

@jurko-gospodnetic jurko-gospodnetic Dec 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradyunsg - about the re-raise - that part of the code was there before my change so I kept it in, but as this is an error handler for rmtree(), I would guess that the idea was that whatever exception happened to trigger the error handler should be passed through unless it is one we know how to handle.

As for the code being valid - I think a rmtree() error handler itself is expected to be called inside an except block and that's why just calling raise works correctly.

As in:

C:\Users\Jurko>py3
Python 3.6.3 (v3.6.3:2c5fed8, Oct  3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> def f():
...     print('haha')
...     raise
...     
... try:
...     raise Exception('catch me if you can...')
... except:
...     f()
...
haha
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in <module>
Exception: catch me if you can...
>>>

And as for what my change aims to catch - I've found that on Windows, especially with Windows 10 (but I've definitely seen it with Windows 8 as well), even on a default installation with full Windows Updates applied, if you attempt delete a complex folder structure with lots of tiny files in it, it occasionally/often fails with an access denied or similar error and then when you repeat the operation it succeeds (possibly first failing a few more times). And in such cases it seems to often be the case that even when you do repeat the operation you find out that the target folder has actually already been deleted.

What the actual underlying cause for this strange behaviour is - I can't say. Could be Windows's search file indexing, could be its default anti-virus scanner software, or something completely unrelated. It happens on my development machine more often than on a default installation so I guess some development tools like TortoiseGit or Visual Studio and their folder content scanning might have something to do with it as well.

And since I've been seeing this behaviour for a long time now and it causes pip installations to fail badly (e.g. pip self-upgrade hits this very often as it has quite an extensive folder structure to remove, and then leaves the environment without any pip installation at all which is a pain, especially for someone not inclined to dig into the pip source code), I think this is a good change. It makes pip handle the issue nicely and just work in every one of my runs with the change applied.

I tried to make the code comment & the issue and pull request descriptions explain all this, but let me know if you think then can/should be worded better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood - but nevertheless I do think we should be cautious about deciding which bugs in the underlying libraries we work around. You've made the case for fixing the basic "the file disappeared" error, I just don't want to extend that argument too far.

As for the cause of this issue, I really would like a better feel of what's going on here, but I doubt either you or I know enough about what Windows is doing to say much more than we already have. I'd be interested in what @zooba thinks of this issue.

By the way, in case it isn't clear, I'm OK in principle with the key part of the fix, but it's bugging me why this seems to happen significantly more for you than for others (you're obviously encountering this often enough for it to me a significant issue for you, I've never seen it, and the number of reports of issues is pretty low - so what's the significant difference between your environment and mine?) I'm not going to reject the PR because we don't know what's causing the problem, though.

Copy link
Author

@jurko-gospodnetic jurko-gospodnetic Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I'd like to know what happens there as well :-(

And the issue happens 'occasionally', but when it starts happening on a specific machine, it happens often. For instance, before the last reboot I could get it on most pip reinstall attempts, while I have not had it occur after the reboot. But I have the pip reinstall running in a looong loop now while working on other stuff so I should see when it happens next. :-) That's why I had a thought that it could be related to that Windows spawning up multiple explorer.exe processes in parallel as that usually occurs after running for a while.

This reboot for the first one after over a month - I just kept the machine in hibernate or sleep when not using it, with several Windows updates accumulating in the mean time, so God only knows what was the main trigger there. :-)

The other machine I've seen this happen on today also has not been rebooted in a while, and I'll try playing around there some more when I get access to it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's happening here is something funky in the way Windows's disk caching works. I see it frequently in PowerShell, where removing a directory will fail with access errors, I suspect because the files are reported deleted but they aren't yet, and so the directory cannot be removed because it isn't empty. But then the write-through occurs and now the directory is empty and can be deleted on a retry. I nearly always end up with a retry loop on directory removal (typically with a few ms delay after each failure) regardless of language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if that's what's going on, then why isn't the retry at https://github.com/pypa/pip/blob/master/src/pip/_internal/utils/misc.py#L105 working to address this?

Copy link

@mhsmith mhsmith May 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry decorator applies to the top-level rmtree operation, not to each individual directory in the tree. For large trees, it's possible that the total number of retries exceeds the limit, even if no directory ever required more than 2 attempts.

So perhaps the decorator should be moved to the error handler, and the handler changed so that it always retries, not just when it changed the file mode.

@jurko-gospodnetic jurko-gospodnetic force-pushed the windows-rmtree-workaround branch 2 times, most recently from 60efd28 to 4fc137e Compare December 17, 2017 22:00
On Windows, we can get an Access Denied error as a result of our rmtree()
operation, but with the original operation actually removing the target
folder. This could cause problems with the repeated operation failing due
to the target folder (or some part of it) not existing.

Workaround is to treat any 'does not exist` errors as success.
@jurko-gospodnetic jurko-gospodnetic force-pushed the windows-rmtree-workaround branch from 4fc137e to 1d81121 Compare December 17, 2017 22:12
@jurko-gospodnetic
Copy link
Author

  • removed the changelog part about rmtree() changing its behaviour on already removed data
  • rebased on top of the current master branch

@@ -0,0 +1,6 @@
Workaround for a Windows issue (encountered for example on Windows 10 machines,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rewrite this fragment as per the documentation: https://pip.pypa.io/en/latest/development/#adding-a-news-entry

In order to maintain a consistent style in the NEWS.rst file, it is preferred to keep the news entry to the point, in sentence case, shorter than 80 characters and in an imperative tone -- an entry should complete the sentence "This change will ...". In rare cases, where one line is not enough, use a summary line in an imperative tone followed by a blank line separating it from a description of the feature/change in one or more paragraphs, each wrapped at 80 characters. Remember that a news entry is meant for end users and should only contain details relevant to an end user.

@pradyunsg
Copy link
Member

Closing due to lack of activity.

@pradyunsg pradyunsg closed this Jan 8, 2019
@lock
Copy link

lock bot commented May 31, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 31, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation OS: windows Windows specific type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rmtree() failing sporadically on Windows
6 participants