Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous flushing of bind-mount caches on docker stop #6512

Closed
1 of 2 tasks
rfay opened this issue Apr 30, 2020 · 5 comments
Closed
1 of 2 tasks

Synchronous flushing of bind-mount caches on docker stop #6512

rfay opened this issue Apr 30, 2020 · 5 comments

Comments

@rfay
Copy link
Contributor

rfay commented Apr 30, 2020

Spin off from #5530 (comment)

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID: (Not possible to grab this, as happens unpredictably in CI environment)

Expected behavior

When a container is destroyed, one should expect that its bind-mount is also destroyed, and will not bleed into other containers which may be started later.

Actual behavior

#5530 (comment) explains the sequence of events:

This is sequential tests in an golang test environment.

  1. There is a bind-mounted directory that has nginx configuration in it.
  2. Test 1 adds a configuration file into the mounted directory, and tests to make sure that the additional configuration is detected.
  3. Test 1 then stops the container and removes the extra configuration from the bind-mounted directory. It does this in a defer, so I haven't been able to figure out any way that it does not happen.
  4. Test 2 then starts a completely new container, which bind-mounts the same directory.
  5. Test 2 nginx startup tries to read the extra configuration (because it was apparently found in the directory) but fails to open it.

@djs55 replied there that

I think your guess is probably right-- currently when a container exits we flush the caches in the VM but unfortunately this is a background task (triggered by an event from the docker engine) So at step (4) when test 2 then starts a completely new container, the cache might not have been flushed yet. If this happens then the readdir will still show the old config file, but of course the file has actually been deleted so the open will fail with ENOENT. Although your tests are sequential and doing things in the right order, the delayed cache flush makes it look like the tests have overlapped.

Although flushing caches in the background when we receive events handles the case of containers stopping by themselves reasonably well, I think we should investigate adding a synchronous cache flush on the docker stop codepath as well.

Information

  • This has generally happened with an older, slower test-runner (2011). Docker Desktop 2.2.0.5, old, slow machine, 8GB Ram, default settings for docker.

Steps to reproduce the behavior

So far, this is intermittent and not reproducible on demand, although it's plenty common enough

@rfay
Copy link
Contributor Author

rfay commented May 15, 2020

This continues to be a real problem, in ddev/ddev#2261 I'm starting to skip tests that do this pattern because they fail too often:

  • Start container with bind mount
  • Stop container with bind mount
  • Rename a subdirectory of the bind mount (on the host)
  • Wait/sleep/even try to invalidate the cache using @djs55 's special tool
  • Start container with bind mount again.
  • Windows denies, because the host subdirectory is still apparently under control of Docker container (that was already stopped)

@mat007
Copy link
Member

mat007 commented May 16, 2020

Thanks for the report.
It’s quite difficult to dig into this without diagnostics. It’s possible to gather diagnostics from the command-line using:

# "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe"
Please specify a command.

USAGE: com.docker.diagnose.exe [options] COMMAND [options]

Gather and upload diagnostics bundles for Docker Desktop.
Commands:
  gather
  upload

Options:
  -path string
        Local path prefix where to store the diagnostics bundle.

Run 'com.docker.diagnose.exe COMMAND help' for more information on the command

Common use cases:
- Generate a diagnostics bundle and upload:
    com.docker.diagnose.exe gather -upload
  This will print a diagnostics ID you can supply for further troubleshooting.

- Generate a local diagnostics bundle to upload later:
    com.docker.diagnose.exe gather
  This will print a diagnostics ID you can use to upload with.
    com.docker.diagnose.exe submit <ID>

- Generate a local diagnostics bundle:
    com.docker.diagnose.exe gather <file>
  This will generate a local bundle in <file>.zip.

If -path is specified, place the diagnostics bundle in that directory. Otherwise the sstem default temporary directory will be used.

Maybe it could be run when a test fails?

@rfay
Copy link
Contributor Author

rfay commented May 16, 2020

Since I had already implemented @djs55 's workaround and struggled with intermittent test failures for a couple of weeks, I just disabled the tests on Windows, not really willing to put more time into it. As indicated in the OP, @djs55 seemed to think this was a logical outcome of the current architecture, where

when a container exits we flush the caches in the VM but unfortunately this is a background task (triggered by an event from the docker engine

@docker-robott
Copy link
Collaborator

Issues go stale after 90 days of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Oct 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants