Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression in go_test generation breaks unix domain sockets created in the temp directory #2776

Closed
mikedanese opened this issue Jan 7, 2021 · 10 comments · Fixed by #2777
Closed
Labels

Comments

@mikedanese
Copy link
Contributor

mikedanese commented Jan 7, 2021

The culprit is 9c1568a
PR: #2696
Intended to fix: #1918

With that commit, os.TempDir() in test invocations moved (anecdotally) from /tmp/ to /usr/local/google/home/mikedanese/.cache/bazel/_bazel_mikedanese/6c65a147ac17188f348dcacc7c554ab3/execroot/io_k8s_kubernetes/_tmp/ce4edd4143b8293ae66371d731514b6b/device_plugin658723152/.

The max length of a unix path is 108 as defined by UNIX_PATH_MAX in <linux/un.h>.

I didn't look closely enough at the issue this change fixed to understand whether we should work around it in kubernetes or whether this should be reverted. This likely affects some number of projects that use grpc-go where UDS servers are fairly common.

What version of Bazel are you using?

Build label: 3.7.1
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Nov 24 17:38:30 2020 (1606239510)
Build timestamp: 1606239510
Build timestamp as int: 1606239510

Does this issue reproduce with the latest releases of all the above?

Yes.

What operating system and processor architecture are you using?

Linux.

What did you do?

Test code that repros:

package foo

import (
        "io/ioutil"
        "net"
        "path/filepath"
        "testing"
)

func TestFoo(t *testing.T) {
        socketDir, err := ioutil.TempDir("", "device_plugin")
        if err != nil {
                t.Fatal(err)
        }

        socketPath := filepath.Join(socketDir, "test.sock")
        t.Logf("socket path: %s", socketPath)

        l, err := net.Listen("unix", socketPath)
        if err != nil {
                t.Fatalf("listen error: %s", err)
        }
        defer l.Close()
}

In gist form if you want to clone and run yourself:

https://gist.github.com/mikedanese/81c14e6e202084f3370034a2f5649191

Run with:

bazel run //:go_default_test -- -test.v

What did you expect to see?

https://github.com/kubernetes/kubernetes tests continue to pass after upgrading to new rules_go version.

What did you see instead?

https://github.com/kubernetes/kubernetes tests broke after upgrading to new rules_go version.

cc @jayconrod @dragonsinth

mikedanese added a commit to mikedanese/rules_go that referenced this issue Jan 7, 2021
This is a breaking change.

Partially-Reverts: 9c1568a
Fixes: bazel-contrib#2776
@mikedanese
Copy link
Contributor Author

Here's a proposed fix that partially reverts the behavior change in the culprit: #2777

@dragonsinth
Copy link
Contributor

dragonsinth commented Jan 7, 2021

I don't necessarily object, I just found it weird and surprising that Go tests weren't respecting Bazel's idea of a temp directory; it seemed to break the idea of bazel containerization. The issue I ran into personally was that value of TMPDIR in the shell where bazel is being invoked has basically no relationship to the value of TMPDIR inside Bazel. In fact, I found that TMPDIR was always blank (instead of something like /var/folders/r6/44yk5d111rn3ddpqfmv1v4540000gp/T/ -- I'm on Mac), so the Go default lib would always use /tmp.

I assume long paths are not generally a problem, but just for unix domain sockets specifically? I might suggest that those tests ought to be written to use /tmp explicitly rather than relying on os.TempDir() to always return a short path. You can't really assume that os.TempDir() is always going to return a short path.

@mikedanese
Copy link
Contributor Author

My concern with fixing these in k8s is that we still have a behavior diff between go test and bazel test. While the CI system mostly uses bazel test, contributors (many not very familiar with bazel) mostly use go test. Behavior differences that causes tests to pass locally and fail during presubmit cause friction and are not obvious to debug even for folks that are familiar with bazel.

@dragonsinth
Copy link
Contributor

dragonsinth commented Jan 7, 2021

You're not wrong, but it's worth pointing out that you already have this problem without bazel being in the mix at all. Go's TempDir is only /tmp as a fallback, any user who tries to go test and happens to have a long TMPDIR env var is going to fail these tests for exactly the same reason.

I'm really only suggesting that you explicitly invoke ioutil.TempDir("/tmp", "device_plugin") in those specific unix domain socket tests.

@jayconrod
Copy link
Contributor

I don't think I agree this is a bug. Bazel sets TEST_TMPDIR with the intent that tests create temporary files there. It was a bug that we ignored that earlier.

Creating UNIX sockets seems like a narrow enough use case that I don't think it makes sense to revert the fix for that bug. For example, we don't do anything like this to avoid long path errors on Windows, which has a much lower path length limit.

Why not use an explicit path with io/ioutil.TempFile or TempDir? Or alternatively, set --test_tmpdir=/tmp in the CI configuration?

@mikedanese
Copy link
Contributor Author

mikedanese commented Jan 7, 2021

Do test runners in other language rules set TMPDIR to TEST_TMPDIR? If so, why is TEST_TMPDIR a separate env variable vs bazel managing TMPDIR in all test executions? The java implemenation of go_test does not have this behavior.

I think I would be happy with --test_tmpdir, but it looks like it's broken when used with --sandbox_tmpfs_path (see .bazelrc):

$ bazel test --nocache_test_results --sandbox_debug --test_output=errors --test_tmpdir=/tmp //:go_default_test
INFO: Analyzed target //:go_default_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
Target //:go_default_test up-to-date:
  bazel-bin/go_default_test_/go_default_test
INFO: Elapsed time: 0.159s, Critical Path: 0.05s
INFO: 2 processes: 1 internal, 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions
//:go_default_test                                                       PASSED in 0.0s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 2 total actions
$ bazel test --nocache_test_results --sandbox_debug --test_output=errors --test_tmpdir=/tmp --sandbox_tmpfs_path=/tmp //:go_default_test
INFO: Analyzed target //:go_default_test (0 packages loaded, 0 targets configured).
INFO: Found 1 test target...
FAIL: //:go_default_test (see /usr/local/google/home/mikedanese/.cache/bazel/_bazel_mikedanese/8fb9acd3ccec4c3b66f7a9bfaf0dcea8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/go_default_test/test.log)
INFO: From Testing //:go_default_test:
==================== Test output for //:go_default_test:
1610048801.617565608: src/main/tools/linux-sandbox.cc:152: calling pipe(2)...
1610048801.617615820: src/main/tools/linux-sandbox.cc:171: calling clone(2)...
1610048801.617898290: src/main/tools/linux-sandbox.cc:180: linux-sandbox-pid1 has PID 464183
1610048801.617922642: src/main/tools/linux-sandbox-pid1.cc:434: Pid1Main started
1610048801.618017900: src/main/tools/linux-sandbox.cc:197: done manipulating pipes
1610048801.618106728: src/main/tools/linux-sandbox-pid1.cc:166: tmpfs: /tmp
1610048801.618154386: src/main/tools/linux-sandbox-pid1.cc:176: working dir: /usr/local/google/home/mikedanese/.cache/bazel/_bazel_mikedanese/8fb9acd3ccec4c3b66f7a9bfaf0dcea8/sandbox/linux-sandbox/10/execroot/__main__
1610048801.618191554: src/main/tools/linux-sandbox-pid1.cc:208: writable: /usr/local/google/home/mikedanese/.cache/bazel/_bazel_mikedanese/8fb9acd3ccec4c3b66f7a9bfaf0dcea8/sandbox/linux-sandbox/10/execroot/__main__
1610048801.618198774: src/main/tools/linux-sandbox-pid1.cc:208: writable: /tmp/_tmp/9c5fdcb54197827c83c3fa826829f6f2
src/main/tools/linux-sandbox-pid1.cc:212: "mount(/tmp/_tmp/9c5fdcb54197827c83c3fa826829f6f2, /tmp/_tmp/9c5fdcb54197827c83c3fa826829f6f2, nullptr, MS_BIND | MS_REC, nullptr)": No such file or directory
1610048801.643958495: src/main/tools/linux-sandbox.cc:233: child exited normally with code 1
================================================================================
Target //:go_default_test up-to-date:
  bazel-bin/go_default_test_/go_default_test
INFO: Elapsed time: 0.189s, Critical Path: 0.07s
INFO: 2 processes: 2 linux-sandbox.
INFO: Build completed, 1 test FAILED, 2 total actions
//:go_default_test                                                       FAILED in 0.0s
  /usr/local/google/home/mikedanese/.cache/bazel/_bazel_mikedanese/8fb9acd3ccec4c3b66f7a9bfaf0dcea8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/go_default_test/test.log

INFO: Build completed, 1 test FAILED, 2 total actions

Error message looks vaguely similar to bazelbuild/bazel#7470.

@jayconrod
Copy link
Contributor

Sorry to be a bit slow on this. Kind of swamped right now on Go 1.16rc1 release blockers.

I'll go ahead and merge #2777.

The Bazel Test Encyclopedia says this (emphasis mine):

All file paths specified in test environment variables point to somewhere on the local filesystem, unless otherwise specified.

Tests should create files only within the directories specified by $TEST_TMPDIR and $TEST_UNDECLARED_OUTPUTS_DIR (if set).
These directories will be initially empty.
Tests must not attempt to remove, chmod, or otherwise alter these directories.
These directories may be a symbolic links.
The filesystem type of $TEST_TMPDIR/. remains unspecified.
Tests may also write .part files to the $TEST_UNDECLARED_OUTPUTS_ANNOTATIONS_DIR to annotate undeclared output files.

In rare cases, a test may be forced to create files in /tmp. For example, path length limits for Unix domain sockets typically require creating the socket under /tmp. Bazel will be unable to track such files; the test itself must take care to be hermetic, to use unique paths to avoid colliding with other, simultaneously running tests and non-test processes, and to clean up the files it creates in /tmp.

Some popular testing frameworks, such as JUnit4 TemporaryFolder or Go TempDir, have their own ways to create a temporary directory under /tmp. These testing frameworks include functionality that cleans up files in /tmp, so you may use them even though they create files outside of TEST_TMPDIR.

My interpretation is that the test is responsible for putting temporary files in TEST_TMPDIR or cleaning them up if they're put somewhere else. Since testing.TempDir does that automatically, it's not necessary for us to set TMPDIR, and as you said it creates an unnecessary difference with go test.

jayconrod pushed a commit that referenced this issue Jan 11, 2021
This is a breaking change.

Partially-Reverts: 9c1568a
Fixes: #2776
@mikedanese
Copy link
Contributor Author

Thanks @jayconrod and @dragonsinth. I'm not vehemently opposed to the previous behavior and definitely not an expert, so I appreciate the help with the interpretation.

@dragonsinth
Copy link
Contributor

dragonsinth commented Jan 13, 2021

I agree that this PR seems in line with what's defined in the Bazel build encyclopedia, but I also think it's unfortunate to create more space between go test and bazel test. I guess philosophically I don't really think it makes sense for most people to write go tests specifically with bazel in mind-- if I write go tests to rely on TEST_TMPDIR being set, those tests will only work under bazel.

I feel a language-specific implementation (like bazel Go)'s entire purpose to create as seamless a mapping as possible between natively developing in the language and doing so with within Bazel. The need to run binaries and tests from an IDE and use a debugger implies that as much as possible should "just work" in Bazel and non-Bazel at the same time. Maybe not everyone feels this way?

It is pretty ironic that this same section explicitly addresses @mikedanese 's exact use case-- unix domain sockets that need very short paths-- by using /tmp explicitly. :/

@linzhp
Copy link
Contributor

linzhp commented Mar 12, 2021

Sorry for being late to the party. I had this issue last year and opened bazelbuild/bazel#11089. The conclusion was it's logical to distinguish TEST_TMPDIR and TMPDIR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants