Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_in_toto_run_with_byproduct test fails sporadically #282

Closed
lukpueh opened this issue Apr 30, 2019 · 4 comments · Fixed by secure-systems-lab/securesystemslib#189 or #325
Closed

Comments

@lukpueh
Copy link
Member

lukpueh commented Apr 30, 2019

Description of issue or feature request:
I've see recently seen these appear sporadically on different Python versions.

======================================================================
FAIL: test_in_toto_run_with_byproduct (tests.test_runlib.TestInTotoRun)
Successfully run, verify recorded byproduct.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/lukpueh/in-toto/tests/test_runlib.py", line 495, in test_in_toto_run_with_byproduct
    "Python" in link.signed.byproducts.get("stderr"))
AssertionError: False is not true
Stderr:
in_toto.runlib:501:INFO:Running 'test_step'...
in_toto.runlib:524:INFO:Running command 'python --version'...
in_toto.runlib:538:INFO:Creating link metadata...

Current behavior:

  • Meaningless test output (i.e. AssertionError: False is not true)
  • Test fails sporadically

Expected behavior:

  • Enhance test output (i.e. add message to self.assertTrue(<expr>, <message>))
  • Fix test
@lukpueh
Copy link
Member Author

lukpueh commented Sep 30, 2019

Unsurprisingly this bug is due to a race condition in the while proc.poll() ...- loop in run_duplicate_streams:

in-toto/in_toto/process.py

Lines 194 to 212 in a481e47

while proc.poll() is None or stdout_part or stderr_part:
# Raise timeout error in they same manner as `subprocess` would do it
if (timeout is not None and
time.time() > proc_start_time + timeout):
proc.kill()
proc.wait()
raise subprocess.TimeoutExpired(cmd, timeout)
# Read from child process's redirected streams, write to parent
# process's standard streams and construct retuirn values
stdout_part = stdout_reader.read()
stderr_part = stderr_reader.read()
sys.stdout.write(stdout_part)
sys.stderr.write(stderr_part)
sys.stdout.flush()
sys.stderr.flush()
stdout_str += stdout_part
stderr_str += stderr_part

The subprocess might write something to its stdout/stderr files after the last read from the parent, but exit before the next read.

The original author of that feature had an additional post-loop read probably for that reason, which didn't survive the port, when the PR was taken over. (IIRC the ... or stdout_part or stderr_part in the loop condition were meant to serve a similar purpose. But looking at them now, they don't make much sense...)

So, assuming that the last write to a stdout/stderr file happens before the process exits, than a post-loop read (or maybe a do-until loop) should fix this issue. Otherwise, if the process can exit before the data becomes available, solving will be more complex.

Apart from above, we should probably also add back the comments from the original author regarding buffered outputs.
The function does not duplicate in realtime if the child process buffers its outputs, as e.g. python's print does unless you tell it not to. Note the difference between...

from in_toto import process as pr

pr.run_duplicate_streams("python -c 'from time import sleep; print(\"now\"); sleep(3); print(\"3 secs later\"); sleep(2)'", timeout=10)
pr.run_duplicate_streams("python -c 'from time import sleep; print(\"now\", flush=True); sleep(3); print(\"3 secs later\", flush=True); sleep(2)'", timeout=10)

@lukpueh lukpueh closed this as completed Sep 30, 2019
@lukpueh lukpueh reopened this Sep 30, 2019
@lukpueh
Copy link
Member Author

lukpueh commented Sep 30, 2019

Also, in_toto.process was moved to securesystemslib (see secure-systems-lab/securesystemslib#174). We should fix this issue there and start using securesystemslib.process here.

@lukpueh
Copy link
Member Author

lukpueh commented Sep 30, 2019

Confirming that one extra loop to read after the process exits does the trick. :)
I'll submit a patch to securesystemslib and will probably also make a downstream patch for the current Debian release (see #309).

@lukpueh
Copy link
Member Author

lukpueh commented Oct 1, 2019

This issue will be fixed in secure-systems-lab/securesystemslib#186. Instead of duplicating the fix, we should update in-toto to use securesystemslib.process (after 186 has been merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant