Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSpot stops execution if number of iterations >= 4. Possible thread deadlock #711

Closed
jesus-gorronogoitia opened this issue Feb 15, 2019 · 14 comments
Labels

Comments

@jesus-gorronogoitia
Copy link
Collaborator

Characteristics

  • Issue Type: bug
  • Reproducibility: always
  • Severity: major
  • Tool/Service/Component: DSpot, v2.0.1
  • Execution Environment: Linux
  • Reporter: Jesús Gorroñogoitia [email protected]

Description

Profiling DSpot execution with Netbeans profiler, I've noticed that DSpot CPU usage drops to 0 after few seconds when the number of iterations is set to a value of 4 or above (it works well with values 1..3).
So, I runned DSpot standalone with same configuration, getting it stuck in the same log point:
dhell.HelloAppTest: @BeforeClass - runOnceAtTheBeginning dhell.HelloAppTest: @AfterClass - runOnceAtTheEnd File saved to the following path: /home/yosu/Projects/STAMP/Git/dhell/target/dspot/observations.ser File saved to the following path: /home/yosu/Projects/STAMP/Git/dhell/target/dspot/TestListener.ser
CPU statistics show that DSpot CPU consumption goes to zero.
Inspecting DSpot source code I noticed it is using Java8 Parallel Stream in certain parts of code. This uses the default ForkJoinThread, which when requires a number of threads higher than the number of physical CPU execution thread, could eventually stuck into a deadlock. So, in my laptop (4 CPU execution threads), DSpot gets stuck when the number of iterations are 4 or above.
However, I don't remember that DSpot is using parallel stream to iterate over the requested iteration.
I will further investigate and report.

Steps to reproduce

Execute DSpot on DHell project using below CLI command.

Properties file

Default Dhell properties file.

Command Line / Options

Other files and URLs

Relationships

Help on issue template

Preview to follow the link or open file .github/ISSUE_DOC.md

@jesus-gorronogoitia
Copy link
Collaborator Author

This problems seems happens only with default PIT selector. It doesn't happen with JacocoCoverageSelector.

@jesus-gorronogoitia
Copy link
Collaborator Author

It is not related to the JoinForkPool thread. Indeed, it is the thread pool-15-thread-1 the one is working while main thread is parked. I want to identify in code where this thread is invoked.

@danglotb
Copy link
Member

Hi @jesus-gorronogoitia

Thank you very much for investigating this.

Please, keep us update of your progress and do not hesitate if you have questions or need help :)

@jesus-gorronogoitia
Copy link
Collaborator Author

According with Netbeans profiling there is a deadlock between main and ForkJoinPool thread workers, possibly because a parallel stream. I will try to identify the causing code.

@jesus-gorronogoitia
Copy link
Collaborator Author

The problem lays in the TestRunner. I've got its code. There is a thread deadlock. Investigating

@jesus-gorronogoitia
Copy link
Collaborator Author

jesus-gorronogoitia commented Feb 26, 2019

Not clear if there is a deadlock, it just takes an enormous time to complete the tests when number of iterations is 4, and CPU is mostly idle. It tooks 42824149 ms to complete execution. I will further investigate. I will check that wait timeouts were not triggered.

@danglotb
Copy link
Member

Thank you @jesus-gorronogoitia !

@jesus-gorronogoitia
Copy link
Collaborator Author

Hi @danglotb This is a quite tricky issue. After spending long time debugging and profiling DSpot behavior over DHell, I have (I think) located the issue in Dhell, but I have no idea why this is happening. I am reporting you so you can have an idea what is going on.

  1. EntryPoint.runGivenCommandLine() forks (in another JVM) the execution of the test cases using the eu.stamp_project.testrunner.runner.JUnit4Runner (in this case), using the Runtime.getRuntime().exec method.
    BTW, I've read this way to fork a JVM is old-style (and discourage) so I will investigate modern ways to do the same. But this is not causing the issue.
  2. Next, using an executor taken from Executors.newSingleThreadExecutor() the RunnableProcess is forked to essentially capture JVM logs to be shown to the user. This should only be done if EntryPoint.verbose, but this check it done within the forked thread. I would suggest not to fork this new thread if verbose is not enabled.
  3. Then, the future returned by executor submit is used to wait to the forked JVM process to finish.

RunnableProcess waits for the process to be completed, and the main thread as well in submit.get()

What is the problem: the forked JVM process never finishes, so the submit.get() is not returned, so we are blocked until the timeout (which is really high: 6001000ms) triggers.

Why does the process never finish?
I've profiled this process (modifying DSpot code to inject the required JVM options). It happens that this process ends in few seconds when launched manually from CLI, but it gets blocked when launched from DSpot.
Profiler data shows that the process never finished because many executed tests (in parallel) get blocked trying to get access dhell.log in the line java.util.logging.StreamHandler.publish, which is invoked in
eu.stamp_project.examples.dhell.HelloApp.<init> line: 27, which is this line

 MyLogger.Instance.info("MyPrintCount = " + Integer.toString(MyPrintCount)
            + " - MyTracesName = " + MyTracesName);

Why does this happens only when executed from DSpot? I have not idea.
It looks all these tests executed in parallel are trying to get the lock to print the log, but it was acquired before by another thread, which is not releasing it. Using Java Mission Control I didn't detect any thread deadlock.

So the point is that this behavior seems to be caused by Dhell. However, I also observe the behavior that DSpot is being notified when all test cases have been executed (in EntryPoint) takes a lot of time, but it is not immediately notified when the tests execution is completed, as it should be.

I will keep investigating.

@jesus-gorronogoitia
Copy link
Collaborator Author

These thread are locked because another thread is executing the same HelloApp() code, it is in RUNNABLE state, but is not being executed, so it does not release the lock. I will investigate is not being executed despite it is in RUNNABLE state.

@jesus-gorronogoitia
Copy link
Collaborator Author

Hi @danglotb I solved it!!!!
Using a ProcessBuilder instead of Runtime.getRuntime().exec().
Code is much more clear. I will polished all my changes, do experiments to be sure results are the same, and I will create a PR.
Do you want me to keep your previous code that is not any longer needed or do you prefer to provide a completely clean code, removing all unneeded methods?.

@danglotb
Copy link
Member

Hello @jesus-gorronogoitia

That's great! Thank you!

Code is much more clear. I will polished all my changes, do experiments to be sure results are the same, and I will create a PR.Code is much more clear. I will polished all my changes, do experiments to be sure results are the same, and I will create a PR.

Cannot wait to see this pull request.

Do you want me to keep your previous code that is not any longer needed or do you prefer to provide a completely clean code, removing all unneeded methods?.

Remove everything that is not used anymore. In the worst case, we have the versioning in GitHub.

Will you create a pull request on STAMP-project/testrunner?

@jesus-gorronogoitia
Copy link
Collaborator Author

Hi @danglotb Proposed changes are in TestRunner component, not in DSpot, but to adopt them in DSpot I needed to update DSpot pom.xml to use the TestRunner 2.0.7-SNAPSHOT version. Hence, what should I proposed in my PR? Only changes in TestRunner and you promote them to Dspot updating the pom.xml?

@danglotb
Copy link
Member

You can open both pull requests.

However, maybe we can release testrunner-2.0.7 with the changes, and then merge the pull request that upgrades the test runner's version used in DSpot.

WDYT?

@jesus-gorronogoitia
Copy link
Collaborator Author

Hi @danglotb
I've just created a PR in TestRunner repository. I agree to promote TestRunner to version 2.0.7 once you accept the PR. Please, wait for me to conduct more experiments (tomorrow morning) in order to ensure the functional equivalence of these changes, before accepting the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants