Faster find #287

trotzig · 2016-06-25T13:21:34Z

This PR does a few things to make the process of importing faster:

finding files is now handled asynchronously (and in parallell). This speeds up the fix command when you are importing more than one variable.
Introduce a importjsd daemon process that you can communicate with using stdin and stdout. This speeds up every command since we no longer have to shell out and spin up a node environment for every command.
Integrate with Watchman. This is only used when run in combination with the daemon (because there's a startup cost), and speeds up finding files (by a lot!)

I've been preparing the vim and sublime plugins (vim:faster-find, sublime:faster-find) so that they're ready for the changes. I was thinking we can release this and let it marinate in the wild a little first though before we merge the branches on the editor plugins. The old command-line tool will continue to work the same way as before, so we can merge this separately if we want to.

I've been wanting to explore a watchman-powered file finder. Before I can do that, I want all file-finding to be async. This is good, because we can parallelize lookups. Instead of adding callbacks all over the place, I decided to use Promises. I believe this will make the code a little simpler to understand as it provides a nice interface for everything async. An annoying issue I ran into while switching to Promises was that I wasn't getting errors surfaced when they happened. It turns out I was bit by the auto-swallowing of errors that Promises have. I came across this article explaining things: http://jamesknelson.com/are-es6-promises-swallowing-your-errors/ In that article, a third-party solution to globally fix the issue is suggested. I decided against bringing in another dependency, and instead bite the bullet and go down the explicit (and somewhat verbose) path, using `reject` and `catch`. To make tests easier to work with, I decided to make the asyncness synchronous by adding a dependency to deasync [1]. I found that from this SO thread: http://stackoverflow.com/questions/16586640/node-js-async-to-sync In the long run, we might want to avoid having that dependency and instead make test assertions async as well. But for now it allowed me to move forward without having to rewrite the (quite massive) test suite that we have. [1]: https://github.com/abbr/deasync

I kept getting an error in Travis from eslint: import-js/lib/findJsModulesFor.js 0:0 error Parsing error: Object.defineProperty called on non-object Initially, I wasn't getting this error locally. But after wiping node_modules and reinstalling, I saw it there too. After some googling, I came across this recent babel-eslint issue: babel/babel-eslint#321 In that thread, people suggest pinning babylon at 6.8.0 (a regression was introduced in 6.8.1). Doing that seems to do the trick.

This is in preparation of adding a importjs command that will run in the background, and allow commands from stdin as well as directly on the command-line.

This is so that I can reuse it in the daemon script that I'm currently writing.

It's not useful yet, but I'm using it as a POC that I can write to STDIN, react to input, then write to STDOUT. The current version simply prints the same output you provide. Here's how to test it in mac osx: Create a pipe that we can use to communicate with: > mkfifo IMPORTJS Start the daemon using the pipe we created as STDIN: > importjsd < IMPORTJS Write to the pipe: > echo 'galooshi' > IMPORTJS Watch for 'galooshi' to get written to the console where the daemon is running.

This brings us closer to having importjsd work as a real daemon for importjs.

This is still not great, but I wanted to avoid having to rename commands on the clients. So instead we match the name of the command to its corresponding Importer function.

I'm playing around with using a vim channel to communicate with the daemon. It's a lot simpler if we can assume all results (even errors) get printed to stdout.

For now, this will always be on and will write to /tmp/importjsd.log. I intend to keep things this way only during development. After that we will probably need a more sophisticated logging setup.

This will help track down failures better.

This will help when looking at the logfile at a later time.

This is not yet completely finalized, but I wanted to get something simple out there to start with. The new file finder is only used in the daemon so far, since there's a startup cost involved in using the watchman file finder. https://facebook.github.io/watchman/

I need this function to normalize paths coming from watchman subscription events. That change will come in a later commit.

- Remove copy-paste comments in this file (from watchman tutorial) - Use `log` (new module extracted from daemon.js) to log stuff - Normalize paths before putting them in the cache (this fixes an issue with `excludes` config being ignored for findFilesWithWatchman).

The daemon is starting to stabilize a little, and I thought I'd document a little how it works.

@lencioni

During development, I haven't tested import-js without watchman being installed. I asked @lencioni to give it a spin without first installing watchman, and he reported seeing this error: Watchman: Watchman was not found in PATH. See https://facebook.github.io/watchman/docs/install.html for installation instructions events.js:154 throw er; // Unhandled 'error' event ^ Error: Watchman was not found in PATH. See https://facebook.github.io/watchman/docs/install.html for installation instructions at exports._errnoException (util.js:890:11) at Process.ChildProcess._handle.onexit (internal/child_process.js:182:32) at onErrorNT (internal/child_process.js:348:16) at _combinedTickCallback (node.js:377:13) at process._tickCallback (node.js:401:11) at Function.Module.runMain (module.js:449:11) at startup (node.js:141:18) at node.js:933:3 It turns out that we have to subscribe to the `error` event from the watchman Client to catch this error. With the changes in this commit, you are now able to run the daemon without having watchman installed (in which case we fallback to using a different file finding strategy).

The watchman file finder is starting to stabilize, and I really need to add tests for it. I found a bug while testing it out locally. The `excludes` config option is meant to filter out files, but that wasn't working for the new finder because we didn't normalize the lookup path the same way the file paths were.

@lencioni

@lencioni was reporting seeing ENFILE errors when running importjsd locally. I think they come from the fact that we issue a `process.cwd()` command every time we normalize a path. events.js:154 throw er; // Unhandled 'error' event ^ Error: ENFILE: file table overflow, uv_cwd at Error (native) at normalizePath (/Users/joe_lencioni/import-js/build/normalizePath.js:18:79) at /Users/joe_lencioni/import-js/build/WatchmanFileCache.js:61:58 at Array.forEach (native) at Client.<anonymous> (/Users/joe_lencioni/import-js/build/WatchmanFileCache.js:60:21) at emitOne (events.js:90:13) at Client.emit (events.js:182:7) at BunserBuf.<anonymous> (/Users/joe_lencioni/import-js/node_modules/fb-watchman/index.js:90:14) at emitOne (events.js:90:13) at BunserBuf.emit (events.js:182:7) Even though cwd might change during the lifetime of the process (outside of our control), I can't see why memoizing it would be bad. It might in fact help prevent subtle bugs.

@lencioni

I think this is causing issues for @lencioni. I plan on making logging a little more sophisticated. But before I do that, we can just temporarily disable it.

I was running in to some weird issues with deasync when adding the watchman-powered file finder to the mix. It wasn't doing much good, so I decided to ditch it in favor of making async assertions using the `done` callback.

This first version simply mocks the `getFiles` function of WatchmanFileCache. We're not really testing too much here really, but at least we have a bit of test coverage for the conditional logic inside findMatchingFiles.

I want to get better control over logging. The previous hack (appending to file) was only meant to be a temporary solution. I considered Winston and Bunyan before deciding to go with the more lightweight loglevel [1]. Right now, we use `debug` level throughout. I'll make that configurable in a later commit. And I'll also pipe the output to a file (loglevel uses console internally). [1]: https://github.com/pimterry/loglevel

We need to be careful with what we're logging to stdout, because clients will be depending on json content in a certain format ending up there. We log to a file called `importjs.log` in the os's default tmp directory. Since this folder isn't super easy to discover, I'll follow up by logging the path to the logfile on startup.

I'm starting to think that we need to support user-specific configuration. Having the loglevel specified in importjs.json is problematic, since you usually share this file with others. For now though, I think it's fine to keep it here.

This will help people who want to find out what's going on behind the scenes.

When importjs is run as a daemon, logging will be important so that possible issues can be detected. We want people to be able to control how much they see of this.

The loglevel-message-prefix [1] takes care of that. [1]: https://github.com/NatLibFi/loglevel-message-prefix

This bit of information should also be added to the README files of the plugin (once they get updated to use the daemon).

By having all editor plugins pass in their process pids, we can make sure that the importjsd process never turns into a zombie. There's a chance that the original parent has died, but a new process has taken over the PID of the old parent. In that case, the importjsd process will become a zombie, even with the code included in this change. That's a somewhat unlikely scenario though, that I don't think we need to bother much about. According to [1], PIDs rotate (up to 32768) on linux systems so there would have to be a lot of processes started before the old PID is reused. Also included in this commit is changing the first log of the command to be one line. This is important for editors so that they know when they can start expecting JSON output on the channel. [1]: http://stackoverflow.com/questions/11323410/linux-pid-recycling

lencioni · 2016-06-25T14:57:19Z

lib/daemon.js

+function log(message) {
+  fs.appendFile('/tmp/importjsd.log', `${message}\n`, (err) => {
+    if (err) {
+      throw new Error(err);


Isn't err an Error? Could this be if (err) throw err;?

Yeah probably. This goes away later anyway.

Now that we've made some performance improvements, I want to make sure that we don't regress too much. A benchmark test can help us better surface how import-js performs under some circumstances. To begin with, I'm just testing the case of importing a single word.

lencioni · 2016-06-25T15:12:48Z

lib/findMatchingFiles.js

+  lookupPath: string,
+  validFilesRegex: string
+): Promise<Array<string>> {
+  const tester = new RegExp(validFilesRegex, 'i');


tester is not a very good variable name.

validFilesPattern?

I think validFilesRegex should be validFilesPattern, and for this RegExp instance to be validFileRegex

I use `fix` a lot more than importing a single word. I want the benchmark script to be more similar to what import-js does in the wild, so I'm switching to `fix`. This is good because it also tests the parallel file finding stuff that we do.

lencioni · 2016-06-25T15:18:36Z

lib/findMatchingFiles.js

 import WatchmanFileCache from './WatchmanFileCache';
+import formattedToRegex from './formattedToRegex';
+import normalizePath from './normalizePath';


Was this added here by accident?

Yeah, I think it snuck in early. I use it later.

lencioni · 2016-06-25T15:41:22Z

👍

Instead of failing with something like "undefined is not a function", we can print a friendly error message if an unknown command is being used.

This allows us to get a little better control over what files are added to the cache. The only difference I noticed from before is that I no longer get flashes of ADDING file foo.js~ REMOVING file foo.js~ from the log. Which is nice.

There was some confusion around these variables, so I made it clearer what is a compiled regexp and what is just a string pattern. Before making the rename, I went down the path of switching to passing in a compiled RegExp to all places. But I couldn't get the `egrep` call inside `findMatchingFilesWithFind` to work with the `toString()` output of RegExp. So I decided to keep the string throughout all finders (so that it's consistent).

@lencioni

@lencioni pointed out in code review that we could be more clear about when you need to know about the daemon. I added a note about it being mostly useful for developers of editor plugins. And I changed the wording for a few other sub sections as well.

@lencioni

This makes the daemon.js file a little cleaner. As suggested by @lencioni in code review.

Even though this is the default, having the prefixes listed will make the code less magic, and more straightforward. https://github.com/NatLibFi/loglevel-message-prefix#configuration

@lencioni

Since all processes of importjsd will log to the same file, keeping logs separated can be a little tricky. By adding the parentPid to the mix, you can follow logs for a specific session. E.g.: tail -f /var/folders/1l/_t6tm7195n00gn/T/importjs.log | grep PID:12345 As suggested by @lencioni: https://github.com/Galooshi/import-js/pull/287/files/ff2474b6553f4bbf396f824021a4052fb6672742..9b6a7ad3bb1bf6eab69e153d6f88355f403a2bcd#r68489982

@lencioni

This can come in handy when debugging for instance. As suggested by @lencioni: https://github.com/Galooshi/import-js/pull/287/files/ff2474b6553f4bbf396f824021a4052fb6672742..9b6a7ad3bb1bf6eab69e153d6f88355f403a2bcd#r68489976

trotzig · 2016-06-30T20:55:04Z

I think I've addressed all of the comments that you had @lencioni. I'm going to go ahead and merge. My plan is to publish a new version (1.0.0) with the changes. And then I'll let that marinate for a bit before I go ahead and push new versions of plugins.

trotzig added 30 commits June 13, 2016 22:58

Make importjs.js a function that takes argv

808fc7c

This is in preparation of adding a importjs command that will run in the background, and allow commands from stdin as well as directly on the command-line.

Move fixing NODE_PATH into own module

5ebc5ba

This is so that I can reuse it in the daemon script that I'm currently writing.

Make daemon use Importer

8f3b5e5

This brings us closer to having importjsd work as a real daemon for importjs.

Map command names to function names before invocation

f9b5cdd

This is still not great, but I wanted to avoid having to rename commands on the clients. So instead we match the name of the command to its corresponding Importer function.

Print importjsd errors to stdout

5c10440

I'm playing around with using a vim channel to communicate with the daemon. It's a lot simpler if we can assume all results (even errors) get printed to stdout.

Add some logging to daemon

c23f42c

For now, this will always be on and will write to /tmp/importjsd.log. I intend to keep things this way only during development. After that we will probably need a more sophisticated logging setup.

Send error stack trace in the daemon response

c93cafe

This will help track down failures better.

Prepend daemon log messages with timestamp

8136298

This will help when looking at the logfile at a later time.

Extract normalizePath function into own module

84a54ff

I need this function to normalize paths coming from watchman subscription events. That change will come in a later commit.

Add some docs for importjsd

f6c28dc

The daemon is starting to stabilize a little, and I thought I'd document a little how it works.

Temporarily disable logging

af4c370

I think this is causing issues for @lencioni. I plan on making logging a little more sophisticated. But before I do that, we can just temporarily disable it.

Stop using deasync for findMatchingFiles test

b015212

I was running in to some weird issues with deasync when adding the watchman-powered file finder to the mix. It wasn't doing much good, so I decided to ditch it in favor of making async assertions using the `done` callback.

Add basic test for watchman file finder

88071ec

This first version simply mocks the `getFiles` function of WatchmanFileCache. We're not really testing too much here really, but at least we have a bit of test coverage for the conditional logic inside findMatchingFiles.

Make logLevel configurable

6ff3e74

I'm starting to think that we need to support user-specific configuration. Having the loglevel specified in importjs.json is problematic, since you usually share this file with others. For now though, I think it's fine to keep it here.

Print path to logfile on daemon startup

cb1eca0

This will help people who want to find out what's going on behind the scenes.

Add info about logging to README

5828579

When importjs is run as a daemon, logging will be important so that possible issues can be detected. We want people to be able to control how much they see of this.

Prefix all logs with timestamp and level

cdb9588

The loglevel-message-prefix [1] takes care of that. [1]: https://github.com/NatLibFi/loglevel-message-prefix

Add info about Watchman to README

ff2474b

This bit of information should also be added to the README files of the plugin (once they get updated to use the daemon).

Fix eslint error in daemon.js

f1c8181

lencioni reviewed Jun 25, 2016
View reviewed changes

trotzig added 9 commits June 26, 2016 21:09

Kill blank line in fix-node-path.js

ba432d6

Write error response if command is unknown

9424d7c

Instead of failing with something like "undefined is not a function", we can print a friendly error message if an unknown command is being used.

Switch to suffix matching for watchman listener

d061f98

This allows us to get a little better control over what files are added to the cache. The only difference I noticed from before is that I no longer get flashes of ADDING file foo.js~ REMOVING file foo.js~ from the log. Which is nice.

Move console.log rerouting into own module

3dc3e76

This makes the daemon.js file a little cleaner. As suggested by @lencioni in code review.

Make config for logLevelMessagePrefix explicit

00d3295

Even though this is the default, having the prefixes listed will make the code less magic, and more straightforward. https://github.com/NatLibFi/loglevel-message-prefix#configuration

Add version info to daemon startup message

a803e9c

This can come in handy when debugging for instance. As suggested by @lencioni: https://github.com/Galooshi/import-js/pull/287/files/ff2474b6553f4bbf396f824021a4052fb6672742..9b6a7ad3bb1bf6eab69e153d6f88355f403a2bcd#r68489976

trotzig merged commit c9ddf7a into master Jun 30, 2016

lencioni deleted the faster-find branch July 5, 2016 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster find #287

Faster find #287

trotzig commented Jun 25, 2016

lencioni Jun 25, 2016

trotzig Jun 25, 2016

lencioni Jun 25, 2016

trotzig Jun 25, 2016

lencioni Jun 25, 2016

trotzig Jun 27, 2016

lencioni Jun 25, 2016

trotzig Jun 25, 2016

lencioni commented Jun 25, 2016

trotzig commented Jun 30, 2016

Faster find #287

Faster find #287

Conversation

trotzig commented Jun 25, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lencioni commented Jun 25, 2016

trotzig commented Jun 30, 2016