Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock by doing create/start/getContainer via varlink connection #3572

Closed
towe75 opened this issue Jul 13, 2019 · 16 comments · Fixed by #3813 or #11316
Closed

Deadlock by doing create/start/getContainer via varlink connection #3572

towe75 opened this issue Jul 13, 2019 · 16 comments · Fixed by #3813 or #11316
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@towe75
Copy link
Contributor

towe75 commented Jul 13, 2019

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I'm working on #3387 and discovered a deadlock problem when doing a series of CreateContainer/StartContainer/GetContainer multiple times via systemd varlink connection.

GetContainer will then block until i restart the io.podman systemd service. Also podman on the cli is blocked until i restart the service.

Steps to reproduce the issue:

  1. Compile this test program
package main

import (
	"fmt"

	"github.com/pascomnet/podman-varlink-test/iopodman"
	"github.com/varlink/go/varlink"
)

func main() {

	varlinkConnection, err := varlink.NewConnection("unix://run/podman/io.podman")
	if err != nil {
		fmt.Printf("failt to start task, could not connect to podman: %v\n", err)
		return
	}
	defer varlinkConnection.Close()

	args := []string{"redis"}

	createOpts := iopodman.Create{
		Args: args,
	}

	containerID, err := iopodman.CreateContainer().Call(varlinkConnection, createOpts)
	if err != nil {
		fmt.Printf("failt to start task, could not create container: %v\n", err)
		return
	}
	fmt.Printf("Created %s\n", containerID)

	_, err = iopodman.StartContainer().Call(varlinkConnection, containerID)
	if err != nil {
		fmt.Printf("failt to start task, could not start container: %v\n", err)
		return
	}
	fmt.Printf("Started %s\n", containerID)

	info, err := iopodman.GetContainer().Call(varlinkConnection, containerID)
	if err != nil {
		fmt.Printf("failt to get container: %v\n", err)
		return
	}
	fmt.Printf("ok. got info %v\n", info)
}
  1. Run it:
Created 9662752041be04564db2c6c53e5ba1092b5d3bab073dcf78de7927490c4fc813
Started 9662752041be04564db2c6c53e5ba1092b5d3bab073dcf78de7927490c4fc813
ok. got info {9662752041be04564db2c6c53e5ba1092b5d3bab073dcf78de7927490c4fc813 docker.io/library/redis:latest .......
  1. Run it repeatedly until it locks up
Created b11cf12254306e7463b534c81b0ba4144d3e1a15b761353670f95446c3e19c8d
Started b11cf12254306e7463b534c81b0ba4144d3e1a15b761353670f95446c3e19c8d

Additional information you deem important (e.g. issue happens only occasionally):

The time of repeations varies, sometimes it locks up on 2nd try, sometimes it takes a while.

Output of podman version:

Version:            1.4.2
RemoteAPI Version:  1
Go Version:         go1.12.5
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.12.5
  podman version: 1.4.2
host:
  BuildahVersion: 1.9.0
  Conmon:
    package: podman-1.4.2-1.fc30.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 0.2.0, commit: d7234dc01ae2ef08c42e3591e876723ad1c914c9'
  Distribution:
    distribution: fedora
    version: "30"
  MemFree: 3809669120
  MemTotal: 4143538176
  OCIRuntime:
    package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3
      spec: 1.0.1-dev
  SwapFree: 4790001664
  SwapTotal: 4949274624
  arch: amd64
  cpus: 2
  hostname: run-fedora
  kernel: 4.4.0-116-generic
  os: linux
  rootless: false
  uptime: 109h 43m 20s (Approximately 4.54 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 5
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mountopt=nodev
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 2
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):

I am running this stuff in a nested Fedora 30 LXC container on a ubuntu 18.04 host (Linux 4.4.0-116-generic)

Also i tried other varlink calls instead of GetContainer():

  • InspectContainer() : it locks up very rarly
  • Ps() : it never locks up

Also i tried sleeping before GetContainer() but it did not change the behavior.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 13, 2019
@towe75
Copy link
Contributor Author

towe75 commented Jul 15, 2019

Some more info:
I did a fresh fedora 30 setup, installed latest podman, enabled varlink systemd socket and ran the test binary.
It is reproducible there. (podman 1.4.4, linux 5.0.9-301.fc30.x86_64 )

@mheon
Copy link
Member

mheon commented Jul 18, 2019

So this is somewhere between StartContainer and GetContainer. Both of those do need to grab the container lock, but both of them give it up quickly once their work is done.

I wonder if we're not seeing some sort of issue with Varlink trying to order requests here, leading to a sort of priority inversion?

@baude
Copy link
Member

baude commented Jul 30, 2019

@towe75 i am back from PTO and ready to look at this .. this is very similar to another reported issue I was debugging today but you are doing it in the context of a sample program ... i'd like to use it. it looks like when you did this, you had a special environment (imports). any chance you peek at this quickly and modify your example to be built from from github.com/containers/libpod/towe75 or some such?

@towe75
Copy link
Contributor Author

towe75 commented Jul 30, 2019

@baude thank you for checking. Find my example attached to this post: podman-issue3572.zip

It contains:

├── go.mod
├── go.sum
├── iopodman
│   └── iopodman.go
└── main.go

Just unpack it anywhere an run go build. It's module enable, so no need to place it into GOPATH.
I used go1.12.7 on fedora 30 to test it.

@baude
Copy link
Member

baude commented Jul 30, 2019

@towe75 do you hang out on #podman on freenode?

@baude
Copy link
Member

baude commented Jul 30, 2019

@towe75 can you confirm that if you run podman varlink outside of systemd, you see no deadlock?

@towe75
Copy link
Contributor Author

towe75 commented Jul 30, 2019

@baude sorry could not catch you in freenode, was quite busy.
Anyhow, i tried this:

systemctl disable io.podman.socket
/usr/bin/podman varlink --timeout 50000 unix:/run/podman/io.podman

in another shell i ran my test binary in a loop. I was able to produce 30 containers without a deadlock whereas with systemd it usually stops working after 2-3 iterations. I'll do a longer test later.

So looks like systemd socket has some effect.

@mheon
Copy link
Member

mheon commented Aug 5, 2019

@baude is working on this one still, I believe. We've tracked the deadlock into c/storage.

@haraldh
Copy link
Contributor

haraldh commented Aug 13, 2019

So, I finally managed to debug this and find the culprit.

So, here is what is happening:

  1. vendor/varlink/go/varlink/socketactivation.go
        /*
        parse LISTEN_FDS LISTEN_PID and LISTEN_FDNAMES
        and get FD 3 as our listen socket
        fd := 3
        */

https://github.com/containers/libpod/blob/031437bf5b19b5121224cdec301dd9945f38e610/vendor/github.com/varlink/go/varlink/socketactivation.go#L54-L57

net.FileListener does a dup(3) and gets a new fd. varlink listens on this new fd.
file gets garbage collected and at some point of garbage collection time close(3) is happening.

  1. At some point in the future in the varlink code accept() gets a new connection, which has fd=3 because this is the lowest number.

Now in the podman codebase there are several places also handling LISTEN_FDS:

pkg/rootless/rootless_linux.c:

https://github.com/containers/libpod/blob/031437bf5b19b5121224cdec301dd9945f38e610/pkg/rootless/rootless_linux.c#L529-L536

https://github.com/containers/libpod/blob/031437bf5b19b5121224cdec301dd9945f38e610/pkg/rootless/rootless_linux.c#L721-L735

libpod/oci_internal_linux.c:

https://github.com/containers/libpod/blob/031437bf5b19b5121224cdec301dd9945f38e610/libpod/oci_internal_linux.go#L250-L254

  1. In our case the last one causes a close(3)
6386  dup2(3, 19)                       = 19
6386  fcntl(19, F_SETFD, FD_CLOEXEC)    = 0
6386  fcntl(0, F_SETFD, 0)              = 0
6386  fcntl(1, F_SETFD, 0)              = 0
6386  fcntl(2, F_SETFD, 0)              = 0
6386  fcntl(3, F_SETFD, 0)              = 0
6386  dup2(18, 4)                       = 4
6386  dup2(19, 5)                       = 5
6386  execve("/usr/libexec/podman/conmon",  ...
  • 3 dupped to 19
  • CLOEXEC on 19
  • 3 is set to 0
  • 19 dupped to 5
  • execve closes 19 because CLOEXEC

Boom, the varlink connection fd=3 is closed.

This code block closed the varlink connection.

The whole LISTEN_FDS handling makes no sense anyway, because LISTEN_FDS begins at 3 and 3 now is 5 and nothing adjusted LISTEN_FDNAMES ???

Anyway commenting out this code block and this issue is gone:

https://github.com/containers/libpod/blob/031437bf5b19b5121224cdec301dd9945f38e610/libpod/oci_internal_linux.go#L250-L254

I don't know what the use case of this code block is and how it would work without adjusting LISTEN_FDNAMES.

Also one has to review the correct handling of the code in pkg/rootless/rootless_linux.c.

You really have to take care of the fact, that some code used one of the LISTEN_FDS and the fd has been closed and the number is reused. One marker could be:
if ((fcntl(f, F_GETFD) & FD_CLOEXEC) == 0)

So, if the CLOEXEC flag is set, don't close the fd on execve. But that is only guesswork also.

@baude @mheon @rhatdan

@mheon
Copy link
Member

mheon commented Aug 13, 2019

I'll tag in @giuseppe here - I'm pretty sure he wrote this for SD_NOTIFY

@giuseppe
Copy link
Member

I am not really familiar with this part, but some things I've observed:

  • if I do rm -fa from the CLI, the test program unblocks
  • if I open a new varlink connection before each command (i.e. reopen the connection before Start and before GetContainer), then the hang doesn't happen

@giuseppe
Copy link
Member

I don't know what the use case of this code block is and how it would work without adjusting LISTEN_FDNAMES.

I also don't understand that code block. Why do we need to propagate the LISTEN_FDs down to conmon? I think it should be safe to remove it. @mheon @baude do you know why it was added?

@mheon
Copy link
Member

mheon commented Aug 13, 2019

Hmmm.

My original assumption was that was your code - it's definitely not mine... I think it might well end up being @rhatdan

@haraldh
Copy link
Contributor

haraldh commented Aug 14, 2019

commit 989f5e3
Author: Daniel J Walsh [email protected]
Date: Sun Feb 4 07:29:47 2018 -0500

@haraldh
Copy link
Contributor

haraldh commented Aug 14, 2019

I don't think it makes sense to use LISTEN_FDS and NOTIFY_SOCKET, if podman is in varlink server mode. This should really be skipped here.

@rhatdan
Copy link
Member

rhatdan commented Aug 15, 2019

I agree these should only be handled for not 'remote' podman.

baude added a commit to baude/podman that referenced this issue Aug 15, 2019
add ability to not activate sd_notify when running under varlink as it
causes deadlocks and hangs.

Fixes: containers#3572

Signed-off-by: baude <[email protected]>
vrothberg added a commit to vrothberg/libpod that referenced this issue Aug 31, 2021
Make sure that Podman passes the LISTEN_* environment into containers.
Similar to runc, LISTEN_PID is set to 1.

Also remove conditionally passing the LISTEN_FDS as extra files.
The condition was wrong (inverted) and introduced to fix containers#3572 which
related to running under varlink which has been dropped entirely
with Podman 3.0.  Note that the NOTIFY_SOCKET and LISTEN_* variables
are cleared when running `system service`.

Fixes: containers#10443
Signed-off-by: Daniel J Walsh <[email protected]>
Signed-off-by: Valentin Rothberg <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
7 participants