-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
t_detects_not_running_in_testbed fails in some build environments #169
Comments
I can confirm that this error does not occur on x86_64 platforms. |
No, to the contrary. It's a test which is known to not work under valgrind, so my CI calls the valgrind tests with Indeed Debian's and Ubuntu's riscv64 builds have been happy for a long time -- I figure they will use QEMU emulation as well instead of actual hardware. But maybe they are just "fast enough" to make that test pass? There's most definitively a race condition somewhere, I just didn't track it down yet (as it's hard to reproduce locally). I'm happy to do something similar to commit 0cffcea, is there an arch counterpart for Otherwise I could just generally skip the test on RISCv64, as these days pretty much every builder is emulated still. |
Hi, the Arch Linux build system specifies However, I think I'd prefer the second solution, i.e. to skip this test on RISC-V platforms, because I've actually attempted to build umockdev on real |
It is annoyingly difficult to determine the current CPU architecture in Vala (something like I tried to run this test case with really little CPU:
and even tried 0.1%, but it always passes. So I'll just mark it as brittle test, so that it gets skipped in package builds. |
This also fails in some riscv64 builds (not in Debian's or Ubuntu's, though). This failure is hard to reproduce, it even passes with sudo systemd-run -p CPUQuota=1% -dt --wait sh -ec 'LD_LIBRARY_PATH=. LD_PRELOAD=libumockdev-preload.so ./test-umockdev-vala -p /umockdev-testbed-vala/detects_running_outside_testbed' So mark it as brittle for now to skip it in normal package builds. Fixes #169
Thanks for your quick response! |
@martinpitt By the way, here at PLCT Lab, we are capable for providing SSH access to SiFive Unmatched boards for debug & CI use. If you are willing to, I can apply for a test account, which should be fulfilled in 5 work days. |
This also fails in some riscv64 builds (not in Debian's or Ubuntu's, though). This failure is hard to reproduce, it even passes with sudo systemd-run -p CPUQuota=1% -dt --wait sh -ec 'LD_LIBRARY_PATH=. LD_PRELOAD=libumockdev-preload.so ./test-umockdev-vala -p /umockdev-testbed-vala/detects_running_outside_testbed' So mark it as brittle for now to skip it in normal package builds. Fixes #169
@XieJiSS : That would be nice actually, to get to the bottom of this! I don't need root privs, but I do need umockdev's build dependencies installed (in Debian/Ubuntu/Fedora etc. one would usually use schroot/mock or a podman container). My public SSH key. Thanks! |
Released as 0.17.6 |
@martinpitt Hi, I've send you an email with ssh configs needed to access the RISC-V board. |
Reopening -- while this does not block package builds any more, I'd still like to investigate what makes the test actually fail. So far I built umockdev on @XieJiSS 's riscv64 machine, and the test works just fine. Both in default parallel as well as in |
I ran the test 1000 times in a row:
I then ran the same loop 5 times in parallel as well. I stared at the code, and I really can't see a race condition there -- the two processes are properly synchronized with writing the byte through the pipe. It's just simply delivering the wrong result. I am reasonably sure that this is not a race condition, but a failure that happens under some particular build conditions which are not met by my plain meson build or the Debian/Ubuntu packages. I guess I'll have to litter this thing with printfs and throw it at the Fedora koji builders, which seem to exhibit the same behaviour (at least when I checked last) |
The failure is gone from Koji's s390x build, presumably because that architecture gets built on real hardware now. But armhfp is still emulated, and it still fails there. However, the failure is literally
that smells like a non-blocking read, or the pipe getting closed prematurely? But the manpage clearly says
I added lots of debugging logs now. Locally, when it works:
This confirms that the pipe fds have flags "0", i.e. neither CLOEXEC nor NONBLOCK. On failed koji:
So the exec'ing of itself just fails. Again, this is not a race condition -- if I add a sleep, it still works fine: --- tests/test-umockdev-vala.vala
+++ tests/test-umockdev-vala.vala
@@ -804,6 +804,7 @@ t_detects_not_running_in_testbed ()
GLib.Environment.unset_variable("LD_PRELOAD");
string[] argv = { "--test-outside-testbed", pipefds[1].to_string() };
debug ("XXX t_detects_not_running_in_testbed: child pid %u, execing myself with --test-outside-testbed %i", Posix.getpid(), pipefds[1]);
+ Posix.sleep (1);
Posix.execv("/proc/self/exe", argv);
error ("execv /proc/self/exe failed: %m");
} So let's validate what /proc/self/exec actually is -- possibly it's QEMU in the emulated case? Locally:
and on koji:
So this is okay -- it's not QEMU as I suspected. |
@XieJiSS : How and where exactly are you building umockdev to trigger the failure? So far there is overwhelming evidence that this only happens in a QEMU emulated build. So far I've been unable to reproduce this on your board with the standard |
I added a 30 times/seconds retry loop, and it But now I finally found a bug, and it seems to fix the problem 🎉 See PR #172. |
Supply a proper argv[0] to our own test program, to fix the argument parsing. This regularly failed in architecture-emulated builds. Also catch failures of execv(). Drop the `BRITTLE_TESTS` skipping, this should now be robust enough. Fixes #169
Supply a proper argv[0] to our own test program, to fix the argument parsing. This regularly failed in architecture-emulated builds. Also catch failures of execv(). Drop the `BRITTLE_TESTS` skipping, this should now be robust enough. Fixes #169
pkgver=0.17.5
I notice that the failing test case is named
umockdev:fails-valgrind / umockdev-vala
. Does this indicate that the test case requires valgrind to success? Currently, valgrind is not usable on RISC-V architecture.Logs attached:
Related full log:
The text was updated successfully, but these errors were encountered: