Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid AppVeyor stack overflow #2344

Closed
wants to merge 1 commit into from

Conversation

scottschurr
Copy link
Collaborator

The json_value::test_nest_limits unit test was failing on Appveyor, presumably because the recursion exceeded the available stack. With this change (suggested by @miguelportilla) I've successfully passed unit tests on Appveyor twice.

Reviewers: @miguelportilla and @mellery451

#ifdef _MSC_VER
#pragma comment(linker, "/STACK:4194304")
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that's interesting, because the stack option is not mentioned in this list of settings that are available via this method:

https://msdn.microsoft.com/en-us/library/7f0aews7.aspx

Doesn't the stack size apply to the entire executable? If yes, then that would be rather pernicious to allow one compilation unit to change it. What would you do if different CUs set different values?

In any event, if the /STACK size does indeed apply to the executable as a whole, then I would strongly suggest it go here instead:

https://github.com/ripple/rippled/blob/eaff9a0e6aec0ad077f118501791c7684debcfd5/Builds/CMake/CMakeFuncs.cmake#L682

strictly speaking, it should also go in some similar place in the SConstruct I guess.

If I'm wrong and somehow stack size can apply to only one compilation unit, then I guess this pragma is appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suspect the setting applies to the entire executable. It also feels weird to me to set the stack size to support one specific unit test. I'm not even confident this is a real fix. I just know it was failing Appveyor before and has Appveyor has passed unit tests twice with this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having slept on it over the weekend, I'm getting less comfortable with this proposed solution. The reasons are two fold:

  1. For coroutines we only require a 1 Meg (not 4 Meg) stack: https://github.com/ripple/rippled/blob/develop/src/ripple/core/Coro.ipp#L43. If we're going to require a minimum stack size we should at least be consistent.
  2. The reason for a recursion limit is so the user sees a reasonable error message rather than crashing the application. Inflating the stack size only for the benefit of the test seems like the tail wagging the dog. The test (and the allowed recursion depth) should suit itself to the environment we expect to be running in.

So, at this point, setting a minimum stack size seems reasonable since coroutines require one. But we should be using the same minimum stack size everywhere (unless someone can provide a documentable reason for an exception). Since coroutines are running with a 1 Meg stack, it seems like that should be the minimum for all stacks.

I also agree with @mellery451 that the right place to be enforcing a minimum stack size is in the various makefiles. Then, if we have recursion depth crashes with a minimum stack size of 1 Meg we should either increase all minimum stack sizes or reduce the allowed recursion depth. Just my feeling at the moment.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is a permanent solution, it was only meant for testing with Appveyor. A better temporary solution is lowering the nest limit as we discussed. I've written to the Appveyor support team and hopefully, they will provide some guidance on determining the difference we are seeing between their platform and the Windows desktops.

Copy link
Contributor

@miguelportilla miguelportilla Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppVeyor support replied and think the problem may be due to the Hyper-V VM we are using, which is configured for dynamic memory. They enabled GCE VMs which use standard memory and suggested we give that a whirl. @scottschurr You can try that by editing the YAML file and adding appveyor_build_worker_cloud: gce under the environment: section

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @miguelportilla. I'll give that a whirl and report back. At the same time, I'm also reducing the json_reader recursion depth to 25, since that approach was agreed up on earlier. I figure anything we can do to stabilize Appveyor until we complete the transition to Jenkins will be good for now.

@ripplelabs-jenkins
Copy link
Collaborator

ripplelabs-jenkins commented Jan 20, 2018

Jenkins Build Summary

Built from this commit

Built at 20180123 - 23:40:04

Test Results

Build Type Result Status
clang.debug.unity 985 cases, 0 failed, t: 391s PASS ✅
coverage 985 cases, 0 failed, t: 623s PASS ✅
clang.debug.nounity 983 cases, 0 failed, t: 343s PASS ✅
gcc.debug.unity 985 cases, 0 failed, t: 443s PASS ✅
gcc.debug.nounity 983 cases, 0 failed, t: 414s PASS ✅
clang.release.unity 984 cases, 0 failed, t: 477s PASS ✅
gcc.release.unity 984 cases, 0 failed, t: 499s PASS ✅

@codecov-io
Copy link

codecov-io commented Jan 20, 2018

Codecov Report

Merging #2344 into develop will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2344      +/-   ##
===========================================
- Coverage    70.04%   70.03%   -0.01%     
===========================================
  Files          704      704              
  Lines        53342    53342              
===========================================
- Hits         37361    37360       -1     
- Misses       15981    15982       +1
Impacted Files Coverage Δ
src/ripple/json/json_reader.h 100% <ø> (ø) ⬆️
src/ripple/server/impl/BaseWSPeer.h 70.51% <0%> (-0.65%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eaff9a0...620229d. Read the comment docs.

o Reduce json_reader max recursion, and
o Use a GCE VM for AppVeyor
@scottschurr scottschurr changed the title Force Windows stack size for json_value::test_nest_limits Avoid AppVeyor stack overflow Jan 23, 2018
@scottschurr
Copy link
Collaborator Author

I did a forced push which replaces the previous approach with two independent changes:

  • Reduce the allowed son_reader recursion depth to 25, and
  • Use a GCE VM with AppVeyor as suggested by AppVeyor support.

Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@scottschurr scottschurr requested a review from ximinez January 27, 2018 00:38
@scottschurr scottschurr requested review from seelabs and removed request for miguelportilla January 27, 2018 00:39
Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, plus tests pass consistently on my Windows box, which was also having problems. 👍

@ximinez ximinez added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jan 27, 2018
@seelabs
Copy link
Collaborator

seelabs commented Jan 30, 2018

In 0.90.0-b5

@seelabs seelabs closed this Jan 30, 2018
@scottschurr scottschurr deleted the appveyor-fail-2 branch February 17, 2018 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants