Avoid AppVeyor stack overflow #2344

scottschurr · 2018-01-20T00:31:43Z

The json_value::test_nest_limits unit test was failing on Appveyor, presumably because the recursion exceeded the available stack. With this change (suggested by @miguelportilla) I've successfully passed unit tests on Appveyor twice.

Reviewers: @miguelportilla and @mellery451

mellery451 · 2018-01-20T00:45:33Z

src/test/json/json_value_test.cpp

+#ifdef _MSC_VER
+#pragma comment(linker, "/STACK:4194304")
+#endif
+


well, that's interesting, because the stack option is not mentioned in this list of settings that are available via this method:

https://msdn.microsoft.com/en-us/library/7f0aews7.aspx

Doesn't the stack size apply to the entire executable? If yes, then that would be rather pernicious to allow one compilation unit to change it. What would you do if different CUs set different values?

In any event, if the /STACK size does indeed apply to the executable as a whole, then I would strongly suggest it go here instead:

https://github.com/ripple/rippled/blob/eaff9a0e6aec0ad077f118501791c7684debcfd5/Builds/CMake/CMakeFuncs.cmake#L682

strictly speaking, it should also go in some similar place in the SConstruct I guess.

If I'm wrong and somehow stack size can apply to only one compilation unit, then I guess this pragma is appropriate.

Yeah, I suspect the setting applies to the entire executable. It also feels weird to me to set the stack size to support one specific unit test. I'm not even confident this is a real fix. I just know it was failing Appveyor before and has Appveyor has passed unit tests twice with this change.

Having slept on it over the weekend, I'm getting less comfortable with this proposed solution. The reasons are two fold:

For coroutines we only require a 1 Meg (not 4 Meg) stack: https://github.com/ripple/rippled/blob/develop/src/ripple/core/Coro.ipp#L43. If we're going to require a minimum stack size we should at least be consistent.

The reason for a recursion limit is so the user sees a reasonable error message rather than crashing the application. Inflating the stack size only for the benefit of the test seems like the tail wagging the dog. The test (and the allowed recursion depth) should suit itself to the environment we expect to be running in.

So, at this point, setting a minimum stack size seems reasonable since coroutines require one. But we should be using the same minimum stack size everywhere (unless someone can provide a documentable reason for an exception). Since coroutines are running with a 1 Meg stack, it seems like that should be the minimum for all stacks.

I also agree with @mellery451 that the right place to be enforcing a minimum stack size is in the various makefiles. Then, if we have recursion depth crashes with a minimum stack size of 1 Meg we should either increase all minimum stack sizes or reduce the allowed recursion depth. Just my feeling at the moment.

Thoughts?

I don't believe this is a permanent solution, it was only meant for testing with Appveyor. A better temporary solution is lowering the nest limit as we discussed. I've written to the Appveyor support team and hopefully, they will provide some guidance on determining the difference we are seeing between their platform and the Windows desktops.

AppVeyor support replied and think the problem may be due to the Hyper-V VM we are using, which is configured for dynamic memory. They enabled GCE VMs which use standard memory and suggested we give that a whirl. @scottschurr You can try that by editing the YAML file and adding appveyor_build_worker_cloud: gce under the environment: section

Thanks @miguelportilla. I'll give that a whirl and report back. At the same time, I'm also reducing the json_reader recursion depth to 25, since that approach was agreed up on earlier. I figure anything we can do to stabilize Appveyor until we complete the transition to Jenkins will be good for now.

ripplelabs-jenkins · 2018-01-20T05:06:07Z

Jenkins Build Summary

Built from this commit

Built at 20180123 - 23:40:04

Test Results

Build Type	Result	Status
clang.debug.unity	985 cases, 0 failed, t: 391s	PASS ✅
coverage	985 cases, 0 failed, t: 623s	PASS ✅
clang.debug.nounity	983 cases, 0 failed, t: 343s	PASS ✅
gcc.debug.unity	985 cases, 0 failed, t: 443s	PASS ✅
gcc.debug.nounity	983 cases, 0 failed, t: 414s	PASS ✅
clang.release.unity	984 cases, 0 failed, t: 477s	PASS ✅
gcc.release.unity	984 cases, 0 failed, t: 499s	PASS ✅

codecov-io · 2018-01-20T17:06:41Z

Codecov Report

Merging #2344 into develop will decrease coverage by <.01%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #2344      +/-   ##
===========================================
- Coverage    70.04%   70.03%   -0.01%     
===========================================
  Files          704      704              
  Lines        53342    53342              
===========================================
- Hits         37361    37360       -1     
- Misses       15981    15982       +1

Impacted Files	Coverage Δ
src/ripple/json/json_reader.h	`100% <ø> (ø)`	⬆️
src/ripple/server/impl/BaseWSPeer.h	`70.51% <0%> (-0.65%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eaff9a0...620229d. Read the comment docs.

o Reduce json_reader max recursion, and o Use a GCE VM for AppVeyor

scottschurr · 2018-01-23T23:18:30Z

I did a forced push which replaces the previous approach with two independent changes:

Reduce the allowed son_reader recursion depth to 25, and
Use a GCE VM with AppVeyor as suggested by AppVeyor support.

seelabs

👍

ximinez

Changes look good, plus tests pass consistently on my Windows box, which was also having problems. 👍

seelabs · 2018-01-30T02:54:07Z

In 0.90.0-b5

scottschurr assigned mellery451 and miguelportilla Jan 20, 2018

scottschurr requested review from mellery451 and miguelportilla January 20, 2018 00:31

mellery451 reviewed Jan 20, 2018

View reviewed changes

Avoid stack overflow on AppVeyor:

620229d

o Reduce json_reader max recursion, and o Use a GCE VM for AppVeyor

scottschurr force-pushed the appveyor-fail-2 branch from 1344d99 to 620229d Compare January 23, 2018 23:11

scottschurr changed the title ~~Force Windows stack size for json_value::test_nest_limits~~ Avoid AppVeyor stack overflow Jan 23, 2018

seelabs approved these changes Jan 27, 2018

View reviewed changes

scottschurr assigned seelabs Jan 27, 2018

scottschurr requested a review from ximinez January 27, 2018 00:38

scottschurr assigned ximinez and unassigned mellery451 and miguelportilla Jan 27, 2018

scottschurr requested review from seelabs and removed request for miguelportilla January 27, 2018 00:39

ximinez approved these changes Jan 27, 2018

View reviewed changes

ximinez added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jan 27, 2018

seelabs closed this Jan 30, 2018

scottschurr deleted the appveyor-fail-2 branch February 17, 2018 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid AppVeyor stack overflow #2344

Avoid AppVeyor stack overflow #2344

scottschurr commented Jan 20, 2018

mellery451 Jan 20, 2018

scottschurr Jan 20, 2018

scottschurr Jan 22, 2018

miguelportilla Jan 23, 2018

miguelportilla Jan 23, 2018 •

edited

Loading

scottschurr Jan 23, 2018

ripplelabs-jenkins commented Jan 20, 2018 •

edited

Loading

codecov-io commented Jan 20, 2018 •

edited

Loading

scottschurr commented Jan 23, 2018

seelabs left a comment

ximinez left a comment

seelabs commented Jan 30, 2018

Avoid AppVeyor stack overflow #2344

Avoid AppVeyor stack overflow #2344

Conversation

scottschurr commented Jan 20, 2018

mellery451 Jan 20, 2018

Choose a reason for hiding this comment

scottschurr Jan 20, 2018

Choose a reason for hiding this comment

scottschurr Jan 22, 2018

Choose a reason for hiding this comment

miguelportilla Jan 23, 2018

Choose a reason for hiding this comment

miguelportilla Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

scottschurr Jan 23, 2018

Choose a reason for hiding this comment

ripplelabs-jenkins commented Jan 20, 2018 • edited Loading

Jenkins Build Summary

Test Results

codecov-io commented Jan 20, 2018 • edited Loading

Codecov Report

scottschurr commented Jan 23, 2018

seelabs left a comment

Choose a reason for hiding this comment

ximinez left a comment

Choose a reason for hiding this comment

seelabs commented Jan 30, 2018

miguelportilla Jan 23, 2018 •

edited

Loading

ripplelabs-jenkins commented Jan 20, 2018 •

edited

Loading

codecov-io commented Jan 20, 2018 •

edited

Loading