-
-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching 2.0 #108
Caching 2.0 #108
Conversation
@jamestalmage give this a shot: |
I found at least part of the problem. See #109 |
var nyc = new NYC({cwd: fixtures}) | ||
nyc.clearCache() | ||
|
||
var args = isWindows ? [bin] : [bin, process.execPath] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me thinks we are making a mistake in mungeArgs
when it coms to windows.
@novemberborn - This test suite now makes it all the way through to source-maps. Some of the tests you merged into my branch are broken on Windows. It looks like it may just be a matter of different path separators being used. |
@jamestalmage make sure you wrap all the appropriate objects with this helper: https://github.com/bcoe/any-path YOLO. |
I am OK with that as a temporary solution. I think it would be better if the istanbul spec were changed to insist on posix paths. |
}) | ||
|
||
var coverage = ap(require('../fixtures/coverage')) | ||
var getReport = function () { | ||
return _.cloneDeep(require('../fixtures/report')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you switch this line to:
return ap(_.cloneDeep(require('../fixtures/report')))
I wouldn't complain if istanbul only stored posix-style paths, it's a bit of an edge-case though -- both \
, and /
work on Windows, it's just biting us in the lookups we're doing for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like another place I screwed up the rebase.
posix-style paths
yeah, might not even make sense to request that change. Since the spec demands absolute paths anyways, that would make windows paths look really weird: C:/foo/bar
@jamestalmage I can figure out these annoying path issues in my Windows VM, if you want to work on the logic for making caching toggled via a CLI flag. |
I would be incredibly happy to punt that one to you!! |
@jamestalmage if you chery-pick this change: I think it will get your tests passing. |
woot! green tests |
OK, just pushed an How do we want to enable/disable the cache? |
@jamestalmage my preference would be a flag like this: https://github.com/bcoe/nyc/blob/master/bin/nyc.js#L72 |
return report | ||
} | ||
|
||
Object.keys(report).forEach(function (relFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is fairly complicated, I'd be tempted to pull it into a logically named "private" method (prefix with _
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative thought, perhaps we try to pull the caching logic into /lib/cache.js
, like we did for source-map-cache.js
.
Keep index.js
as thin as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@jamestalmage since caching is defaulting to false now, I think we suddenly stopped covering quite a bit of the caching logic -- hence coveralls yelling. It would probably be good to add a couple explicit tests here: https://github.com/bcoe/nyc/pull/108/files#diff-7332f11ea438d83b6d316f2064202ebbR442 Creating the new NYC({enableCache: true}) Another nit, there's quite a bit of refactoring work on this branch that I think isn't directly related to the cache logic? https://github.com/bcoe/nyc/pull/108/files#diff-168726dbe96b3ce427e7fedce31bb0bcR107 This makes it hard for me to differentiate the logic related to the new cache, vs. other optimizations -- making my code-review less directed and making me more gun-shy about merging. In a perfect world, we'd land these unrelated changes in separate patches -- this also helps communicate in the git-history, the reasoning behind each change (they all seem logical from what I'm reading). I should add this is ⭐ ⭐ amazing ⭐ ⭐ work. |
Yeah, I let it get away from me. I could try and squash into a more logical series of commits, then cherrypick into multiple PR's. It might be a pain though. The Windows changes broke a number of things on this branch, and my rebase was less than "smooth" |
@jamestalmage perhaps, since there are so many commits from our Windows changes, etc, just amend the git history at some point on this branch, and squash it into a few logical steps? |
I'd say, let's shoot for this:
|
I opted to just run all the tests twice, both ways. There are probably a number of tests where it doesn't make sense to do the duplication (i.e. |
@jamestalmage running twice seems reasonable to me. Good way to thoroughly test the cache logic 👍 Looks like we just need to figure out a good way to toggle the value -- could you simply generate two copies of each test during the build phase? Maybe I'm blanking on an easier solution for Windows, since I'm tired :) |
working on that now. |
182ee44
to
21be8f5
Compare
@@ -405,7 +421,7 @@ describe('nyc', function () { | |||
}) | |||
var report = reports[0]['./test/fixtures/not-loaded.js'] | |||
|
|||
reports.length.should.equal(1) | |||
reports.length.should.equal(enableCache ? 2 : 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add this because the tests don't clean up after themselves between runs.
They really should.
After we merge this, we should just add rimraf .... all the output dirs...
to the top of the test file. That acts like a beforeEach
will get us a clean state for every test.
If that breaks a test because it needed a specific output from a previous test, it should instead explicitly spawn the CLI. That makes our tests much more explicit.
6a798de
to
bc4e9b9
Compare
@bcoe - OK, I think I've squashed it down as much as I can without things getting painful. It's still not the cleanest history, but reducing further would require a lot of work. I should still go back and add some more detailed commit messages (it is currently just the default squash result - concatenation of all the squashed messages). @novemberborn - I just squashed your contributions into a single commit. |
@bcoe - I guess I am a little unclear on how I should use spawn wrap to propagate the cache setting to child processes. Set a bah - nevermind - just found it |
26c0665
to
f5ace84
Compare
OK, I also made the recommended change to
It exits with a |
Assuming CI passes, I think we are good to go. |
Thanks.
It writes the |
Gotcha. I figured out that it wrote |
Chalk this one up to "aren't file systems fun?" If you have a dozen processes but it's always reading/writing to the same file, then the OS's heuristics will see that the same file is being read and written repeatedly, and keep it in the file system's cache. (Ie, it won't even get written to disk.) This means that the operations take a lot less time, and race conditions magically disappear! Lovely! Except, when you have dozens or hundreds of files, especially if they're pretty big, and even a small number of processes, what happens is that the caching mechanisms can't hold all the data, and the race conditions get more relevant again. So, "works on my machine", "tests all pass", but repeatedly breaks in production. File systems are liars. Well-intended liars that usually make programs faster and more reliable, but... still liars. Not to mention, often stuff will work fine on a Linux machine, but not on Windows. Or work on a Linux VM on a windows machine, but not on an actual linux machine. Or it'll work on OS X, but not if you use a specific FS option when you set up your machine, or with SSDs but not with spinning platter drives. And here I thought server side JS would mean getting away from all that browser incompatibility bs. ;P |
Great work on this @jamestalmage; I'll do a final pass of tests on the usual suspects, and some testing on Windows -- I'm pretty excited about this branch. |
This will automatically invalidate the cache anytime `nyc` or `istanbul` change versions.
@jamestalmage did another smoke test with the current branch, I tested the following libraries with
I get the following error: Everything else worked like a charm. |
Even if the `--cache` flag is not set. If the report has a `contentHash` property set, then we know caching was enabled when they generated it. Users shouldn't have to specify `--cache` when running the report command anyways, especially since it's possible for us to auto-detect this.
@jamestalmage unfortunately, I'm still getting this behavior: I've changed the testing line to the following: nyc --reporter=lcov --cache --require babel-core/register mocha So it doesn't appear to be an issue with running Again everything works if I don't pass cache, seems like it must be something small. |
Try deleting the cache. This is yargs right? It all worked for me. |
@jamestalmage can't seem to get it to run, I'm testing using yarsay (which is my one library with a babel require hook, so I find it's good for smoke tests). It would probably be smart for us to create a script that tests against several libraries, rather than my painful manual step. |
I used yarsay to debug the problem, and made the same script modifications you did. I'm away from my computer most of the day - I can verify later. |
Previously, we were only recording the hash in `hashCache` if there was a cache miss. This meant that the `contentHash` property was only properly attached for files not in the cache, and that source maps would not be applied correctly to reports generated from cached instrumentations. The solution is to always store the hash, but selectively attach `contentHash` to the reports based on `options.enableCache`.
@jamestalmage awesome, I will re-test today. |
@jamestalmage I've confirmed that everything is working with my set of smoke-testing libraries 👍 rebase and merge at will? |
I am on mobile ATM, but it appears mergeable now. |
@novemberborn @jamestalmage give the npm i nyc@next |
@novemberborn @jamestalmage @sindresorhus |
Awesome job guys! Just some numbers from a very small project to confirm the improvements: (I took the most average result of 4 runs) Test only:
nyc 4.0.0:
nyc 5.3.0:
So thank you :) This is great! |
see #101 for historical conversation.
First, An Apology:
This is PR #101 rebased onto master. The rebase was not entirely smooth, and getting to a perfect git history has proven fairly difficult. My apologies to reviewers, but I think this is about as good as it is going to get.
I have squashed the initial ~27 commits down to 14. The commit messages are now very detailed, so I encourage you to step through them one by one.
Most of the commits are logical units of work, but there are two unfortunate "fixup" commits, that I would have preferred squashing with earlier commits, but reordering the git history to accommodate proved too difficult to be worth it.
A number of the commit messages contain a "Future Changes" section, where I explain what eventually is changed by the "fixup" commits. Not a perfect solution, but hopefully helpful to you as you review.
The meat of the work can be found in the 5 commits linked in the benchmarking section below. Much of those are refactored in future commits, based on feedback received here and in the #101 PR.
Overview:
This PR greatly improves the speed of
nyc
via a number of strategies, the primary one being caching.istanbul
transforms.lodash
, which was expensive torequire
.require
ing instanbul until it is actually needed (i.e. there is a cache miss).istanbul
is also expensive to require.Benchmarks
I decided to use the AVA test suite as my primary benchmark. The AVA test suite forks 50 some processes, nearly all of which collect coverage information. This makes it an ideal candidate for benchmarking
nyc
s performance. I have tested on much simpler projects, and the improvements are noticeable even in projects that only fork one additional process (though I suspect that has to do more with simplifying our dependency graph than caching).I use the term "coverage penalty" to mean how much additional time the test suite takes to execute when
nyc
is in use. This provides a better indication of the significance of each optimization.