-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster find #287
Faster find #287
Conversation
I've been wanting to explore a watchman-powered file finder. Before I can do that, I want all file-finding to be async. This is good, because we can parallelize lookups. Instead of adding callbacks all over the place, I decided to use Promises. I believe this will make the code a little simpler to understand as it provides a nice interface for everything async. An annoying issue I ran into while switching to Promises was that I wasn't getting errors surfaced when they happened. It turns out I was bit by the auto-swallowing of errors that Promises have. I came across this article explaining things: http://jamesknelson.com/are-es6-promises-swallowing-your-errors/ In that article, a third-party solution to globally fix the issue is suggested. I decided against bringing in another dependency, and instead bite the bullet and go down the explicit (and somewhat verbose) path, using `reject` and `catch`. To make tests easier to work with, I decided to make the asyncness synchronous by adding a dependency to deasync [1]. I found that from this SO thread: http://stackoverflow.com/questions/16586640/node-js-async-to-sync In the long run, we might want to avoid having that dependency and instead make test assertions async as well. But for now it allowed me to move forward without having to rewrite the (quite massive) test suite that we have. [1]: https://github.com/abbr/deasync
I kept getting an error in Travis from eslint: import-js/lib/findJsModulesFor.js 0:0 error Parsing error: Object.defineProperty called on non-object Initially, I wasn't getting this error locally. But after wiping node_modules and reinstalling, I saw it there too. After some googling, I came across this recent babel-eslint issue: babel/babel-eslint#321 In that thread, people suggest pinning babylon at 6.8.0 (a regression was introduced in 6.8.1). Doing that seems to do the trick.
This is in preparation of adding a importjs command that will run in the background, and allow commands from stdin as well as directly on the command-line.
This is so that I can reuse it in the daemon script that I'm currently writing.
It's not useful yet, but I'm using it as a POC that I can write to STDIN, react to input, then write to STDOUT. The current version simply prints the same output you provide. Here's how to test it in mac osx: Create a pipe that we can use to communicate with: > mkfifo IMPORTJS Start the daemon using the pipe we created as STDIN: > importjsd < IMPORTJS Write to the pipe: > echo 'galooshi' > IMPORTJS Watch for 'galooshi' to get written to the console where the daemon is running.
This brings us closer to having importjsd work as a real daemon for importjs.
This is still not great, but I wanted to avoid having to rename commands on the clients. So instead we match the name of the command to its corresponding Importer function.
I'm playing around with using a vim channel to communicate with the daemon. It's a lot simpler if we can assume all results (even errors) get printed to stdout.
For now, this will always be on and will write to /tmp/importjsd.log. I intend to keep things this way only during development. After that we will probably need a more sophisticated logging setup.
This will help track down failures better.
This will help when looking at the logfile at a later time.
This is not yet completely finalized, but I wanted to get something simple out there to start with. The new file finder is only used in the daemon so far, since there's a startup cost involved in using the watchman file finder. https://facebook.github.io/watchman/
I need this function to normalize paths coming from watchman subscription events. That change will come in a later commit.
- Remove copy-paste comments in this file (from watchman tutorial) - Use `log` (new module extracted from daemon.js) to log stuff - Normalize paths before putting them in the cache (this fixes an issue with `excludes` config being ignored for findFilesWithWatchman).
The daemon is starting to stabilize a little, and I thought I'd document a little how it works.
During development, I haven't tested import-js without watchman being installed. I asked @lencioni to give it a spin without first installing watchman, and he reported seeing this error: Watchman: Watchman was not found in PATH. See https://facebook.github.io/watchman/docs/install.html for installation instructions events.js:154 throw er; // Unhandled 'error' event ^ Error: Watchman was not found in PATH. See https://facebook.github.io/watchman/docs/install.html for installation instructions at exports._errnoException (util.js:890:11) at Process.ChildProcess._handle.onexit (internal/child_process.js:182:32) at onErrorNT (internal/child_process.js:348:16) at _combinedTickCallback (node.js:377:13) at process._tickCallback (node.js:401:11) at Function.Module.runMain (module.js:449:11) at startup (node.js:141:18) at node.js:933:3 It turns out that we have to subscribe to the `error` event from the watchman Client to catch this error. With the changes in this commit, you are now able to run the daemon without having watchman installed (in which case we fallback to using a different file finding strategy).
The watchman file finder is starting to stabilize, and I really need to add tests for it. I found a bug while testing it out locally. The `excludes` config option is meant to filter out files, but that wasn't working for the new finder because we didn't normalize the lookup path the same way the file paths were.
@lencioni was reporting seeing ENFILE errors when running importjsd locally. I think they come from the fact that we issue a `process.cwd()` command every time we normalize a path. events.js:154 throw er; // Unhandled 'error' event ^ Error: ENFILE: file table overflow, uv_cwd at Error (native) at normalizePath (/Users/joe_lencioni/import-js/build/normalizePath.js:18:79) at /Users/joe_lencioni/import-js/build/WatchmanFileCache.js:61:58 at Array.forEach (native) at Client.<anonymous> (/Users/joe_lencioni/import-js/build/WatchmanFileCache.js:60:21) at emitOne (events.js:90:13) at Client.emit (events.js:182:7) at BunserBuf.<anonymous> (/Users/joe_lencioni/import-js/node_modules/fb-watchman/index.js:90:14) at emitOne (events.js:90:13) at BunserBuf.emit (events.js:182:7) Even though cwd might change during the lifetime of the process (outside of our control), I can't see why memoizing it would be bad. It might in fact help prevent subtle bugs.
I think this is causing issues for @lencioni. I plan on making logging a little more sophisticated. But before I do that, we can just temporarily disable it.
I was running in to some weird issues with deasync when adding the watchman-powered file finder to the mix. It wasn't doing much good, so I decided to ditch it in favor of making async assertions using the `done` callback.
This first version simply mocks the `getFiles` function of WatchmanFileCache. We're not really testing too much here really, but at least we have a bit of test coverage for the conditional logic inside findMatchingFiles.
I want to get better control over logging. The previous hack (appending to file) was only meant to be a temporary solution. I considered Winston and Bunyan before deciding to go with the more lightweight loglevel [1]. Right now, we use `debug` level throughout. I'll make that configurable in a later commit. And I'll also pipe the output to a file (loglevel uses console internally). [1]: https://github.com/pimterry/loglevel
We need to be careful with what we're logging to stdout, because clients will be depending on json content in a certain format ending up there. We log to a file called `importjs.log` in the os's default tmp directory. Since this folder isn't super easy to discover, I'll follow up by logging the path to the logfile on startup.
I'm starting to think that we need to support user-specific configuration. Having the loglevel specified in importjs.json is problematic, since you usually share this file with others. For now though, I think it's fine to keep it here.
This will help people who want to find out what's going on behind the scenes.
When importjs is run as a daemon, logging will be important so that possible issues can be detected. We want people to be able to control how much they see of this.
The loglevel-message-prefix [1] takes care of that. [1]: https://github.com/NatLibFi/loglevel-message-prefix
This bit of information should also be added to the README files of the plugin (once they get updated to use the daemon).
By having all editor plugins pass in their process pids, we can make sure that the importjsd process never turns into a zombie. There's a chance that the original parent has died, but a new process has taken over the PID of the old parent. In that case, the importjsd process will become a zombie, even with the code included in this change. That's a somewhat unlikely scenario though, that I don't think we need to bother much about. According to [1], PIDs rotate (up to 32768) on linux systems so there would have to be a lot of processes started before the old PID is reused. Also included in this commit is changing the first log of the command to be one line. This is important for editors so that they know when they can start expecting JSON output on the channel. [1]: http://stackoverflow.com/questions/11323410/linux-pid-recycling
function log(message) { | ||
fs.appendFile('/tmp/importjsd.log', `${message}\n`, (err) => { | ||
if (err) { | ||
throw new Error(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't err
an Error
? Could this be if (err) throw err;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah probably. This goes away later anyway.
Now that we've made some performance improvements, I want to make sure that we don't regress too much. A benchmark test can help us better surface how import-js performs under some circumstances. To begin with, I'm just testing the case of importing a single word.
lookupPath: string, | ||
validFilesRegex: string | ||
): Promise<Array<string>> { | ||
const tester = new RegExp(validFilesRegex, 'i'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tester
is not a very good variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validFilesPattern
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think validFilesRegex
should be validFilesPattern
, and for this RegExp
instance to be validFileRegex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use `fix` a lot more than importing a single word. I want the benchmark script to be more similar to what import-js does in the wild, so I'm switching to `fix`. This is good because it also tests the parallel file finding stuff that we do.
import WatchmanFileCache from './WatchmanFileCache'; | ||
import formattedToRegex from './formattedToRegex'; | ||
import normalizePath from './normalizePath'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this added here by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it snuck in early. I use it later.
👍 |
Instead of failing with something like "undefined is not a function", we can print a friendly error message if an unknown command is being used.
This allows us to get a little better control over what files are added to the cache. The only difference I noticed from before is that I no longer get flashes of ADDING file foo.js~ REMOVING file foo.js~ from the log. Which is nice.
There was some confusion around these variables, so I made it clearer what is a compiled regexp and what is just a string pattern. Before making the rename, I went down the path of switching to passing in a compiled RegExp to all places. But I couldn't get the `egrep` call inside `findMatchingFilesWithFind` to work with the `toString()` output of RegExp. So I decided to keep the string throughout all finders (so that it's consistent).
@lencioni pointed out in code review that we could be more clear about when you need to know about the daemon. I added a note about it being mostly useful for developers of editor plugins. And I changed the wording for a few other sub sections as well.
This makes the daemon.js file a little cleaner. As suggested by @lencioni in code review.
Even though this is the default, having the prefixes listed will make the code less magic, and more straightforward. https://github.com/NatLibFi/loglevel-message-prefix#configuration
Since all processes of importjsd will log to the same file, keeping logs separated can be a little tricky. By adding the parentPid to the mix, you can follow logs for a specific session. E.g.: tail -f /var/folders/1l/_t6tm7195n00gn/T/importjs.log | grep PID:12345 As suggested by @lencioni: https://github.com/Galooshi/import-js/pull/287/files/ff2474b6553f4bbf396f824021a4052fb6672742..9b6a7ad3bb1bf6eab69e153d6f88355f403a2bcd#r68489982
This can come in handy when debugging for instance. As suggested by @lencioni: https://github.com/Galooshi/import-js/pull/287/files/ff2474b6553f4bbf396f824021a4052fb6672742..9b6a7ad3bb1bf6eab69e153d6f88355f403a2bcd#r68489976
I think I've addressed all of the comments that you had @lencioni. I'm going to go ahead and merge. My plan is to publish a new version (1.0.0) with the changes. And then I'll let that marinate for a bit before I go ahead and push new versions of plugins. |
This PR does a few things to make the process of importing faster:
find
ing files is now handled asynchronously (and in parallell). This speeds up thefix
command when you are importing more than one variable.importjsd
daemon process that you can communicate with using stdin and stdout. This speeds up every command since we no longer have to shell out and spin up a node environment for every command.I've been preparing the vim and sublime plugins (vim:faster-find, sublime:faster-find) so that they're ready for the changes. I was thinking we can release this and let it marinate in the wild a little first though before we merge the branches on the editor plugins. The old command-line tool will continue to work the same way as before, so we can merge this separately if we want to.