Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster find #287

Merged
merged 41 commits into from
Jun 30, 2016
Merged

Faster find #287

merged 41 commits into from
Jun 30, 2016

Conversation

trotzig
Copy link
Collaborator

@trotzig trotzig commented Jun 25, 2016

This PR does a few things to make the process of importing faster:

  • finding files is now handled asynchronously (and in parallell). This speeds up the fix command when you are importing more than one variable.
  • Introduce a importjsd daemon process that you can communicate with using stdin and stdout. This speeds up every command since we no longer have to shell out and spin up a node environment for every command.
  • Integrate with Watchman. This is only used when run in combination with the daemon (because there's a startup cost), and speeds up finding files (by a lot!)

I've been preparing the vim and sublime plugins (vim:faster-find, sublime:faster-find) so that they're ready for the changes. I was thinking we can release this and let it marinate in the wild a little first though before we merge the branches on the editor plugins. The old command-line tool will continue to work the same way as before, so we can merge this separately if we want to.

trotzig added 30 commits June 13, 2016 22:58
I've been wanting to explore a watchman-powered file finder. Before I
can do that, I want all file-finding to be async. This is good, because
we can parallelize lookups.

Instead of adding callbacks all over the place, I decided to use
Promises. I believe this will make the code a little simpler to
understand as it provides a nice interface for everything async.

An annoying issue I ran into while switching to Promises was that I
wasn't getting errors surfaced when they happened. It turns out I was
bit by the auto-swallowing of errors that Promises have. I came across
this article explaining things:
http://jamesknelson.com/are-es6-promises-swallowing-your-errors/

In that article, a third-party solution to globally fix the issue is
suggested. I decided against bringing in another dependency, and instead
bite the bullet and go down the explicit (and somewhat verbose) path,
using `reject` and `catch`.

To make tests easier to work with, I decided to make the asyncness
synchronous by adding a dependency to deasync [1]. I found that from
this SO thread:
http://stackoverflow.com/questions/16586640/node-js-async-to-sync

In the long run, we might want to avoid having that dependency and
instead make test assertions async as well. But for now it allowed me to
move forward without having to rewrite the (quite massive) test suite
that we have.

[1]: https://github.com/abbr/deasync
I kept getting an error in Travis from eslint:

  import-js/lib/findJsModulesFor.js
  0:0  error  Parsing error: Object.defineProperty called on non-object

Initially, I wasn't getting this error locally. But after wiping
node_modules and reinstalling, I saw it there too. After some googling,
I came across this recent babel-eslint issue:

  babel/babel-eslint#321

In that thread, people suggest pinning babylon at 6.8.0 (a regression
was introduced in 6.8.1). Doing that seems to do the trick.
This is in preparation of adding a importjs command that will run in the
background, and allow commands from stdin as well as directly on the
command-line.
This is so that I can reuse it in the daemon script that I'm currently
writing.
It's not useful yet, but I'm using it as a POC that I can write to
STDIN, react to input, then write to STDOUT. The current version simply
prints the same output you provide.

Here's how to test it in mac osx:

Create a pipe that we can use to communicate with:
 > mkfifo IMPORTJS

Start the daemon using the pipe we created as STDIN:
 > importjsd < IMPORTJS

Write to the pipe:
 > echo 'galooshi' > IMPORTJS

Watch for 'galooshi' to get written to the console where the daemon is
running.
This brings us closer to having importjsd work as a real daemon for
importjs.
This is still not great, but I wanted to avoid having to rename commands
on the clients. So instead we match the name of the command to its
corresponding Importer function.
I'm playing around with using a vim channel to communicate with the
daemon. It's a lot simpler if we can assume all results (even errors)
get printed to stdout.
For now, this will always be on and will write to /tmp/importjsd.log.
I intend to keep things this way only during development. After that we
will probably need a more sophisticated logging setup.
This will help track down failures better.
This will help when looking at the logfile at a later time.
This is not yet completely finalized, but I wanted to get something
simple out there to start with. The new file finder is only used in the
daemon so far, since there's a startup cost involved in using the
watchman file finder.

https://facebook.github.io/watchman/
I need this function to normalize paths coming from watchman
subscription events. That change will come in a later commit.
- Remove copy-paste comments in this file (from watchman tutorial)
- Use `log` (new module extracted from daemon.js) to log stuff
- Normalize paths before putting them in the cache (this fixes an issue
  with `excludes` config being ignored for findFilesWithWatchman).
The daemon is starting to stabilize a little, and I thought I'd document
a little how it works.
During development, I haven't tested import-js without watchman being
installed. I asked @lencioni to give it a spin without first installing
watchman, and he reported seeing this error:

Watchman:  Watchman was not found in PATH.  See https://facebook.github.io/watchman/docs/install.html for installation instructions
events.js:154
      throw er; // Unhandled 'error' event
      ^

Error: Watchman was not found in PATH.  See https://facebook.github.io/watchman/docs/install.html for installation instructions
    at exports._errnoException (util.js:890:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:182:32)
    at onErrorNT (internal/child_process.js:348:16)
    at _combinedTickCallback (node.js:377:13)
    at process._tickCallback (node.js:401:11)
    at Function.Module.runMain (module.js:449:11)
    at startup (node.js:141:18)
    at node.js:933:3

It turns out that we have to subscribe to the `error` event from the
watchman Client to catch this error.

With the changes in this commit, you are now able to run the daemon
without having watchman installed (in which case we fallback to using a
different file finding strategy).
The watchman file finder is starting to stabilize, and I really need to
add tests for it. I found a bug while testing it out locally. The
`excludes` config option is meant to filter out files, but that wasn't
working for the new finder because we didn't normalize the lookup path
the same way the file paths were.
@lencioni was reporting seeing ENFILE errors when running importjsd
locally. I think they come from the fact that we issue a `process.cwd()`
command every time we normalize a path.

events.js:154
      throw er; // Unhandled 'error' event
      ^

Error: ENFILE: file table overflow, uv_cwd
    at Error (native)
    at normalizePath (/Users/joe_lencioni/import-js/build/normalizePath.js:18:79)
    at /Users/joe_lencioni/import-js/build/WatchmanFileCache.js:61:58
    at Array.forEach (native)
    at Client.<anonymous> (/Users/joe_lencioni/import-js/build/WatchmanFileCache.js:60:21)
    at emitOne (events.js:90:13)
    at Client.emit (events.js:182:7)
    at BunserBuf.<anonymous> (/Users/joe_lencioni/import-js/node_modules/fb-watchman/index.js:90:14)
    at emitOne (events.js:90:13)
    at BunserBuf.emit (events.js:182:7)

Even though cwd might change during the lifetime of the process (outside
of our control), I can't see why memoizing it would be bad. It might in
fact help prevent subtle bugs.
I think this is causing issues for @lencioni. I plan on making logging a
little more sophisticated. But before I do that, we can just temporarily
disable it.
I was running in to some weird issues with deasync when adding the
watchman-powered file finder to the mix. It wasn't doing much good, so I
decided to ditch it in favor of making async assertions using the `done`
callback.
This first version simply mocks the `getFiles` function of
WatchmanFileCache. We're not really testing too much here really, but at
least we have a bit of test coverage for the conditional logic inside
findMatchingFiles.
I want to get better control over logging. The previous hack (appending
to file) was only meant to be a temporary solution.

I considered Winston and Bunyan before deciding to go with the more
lightweight loglevel [1]. Right now, we use `debug` level throughout.
I'll make that configurable in a later commit. And I'll also pipe the
output to a file (loglevel uses console internally).

[1]: https://github.com/pimterry/loglevel
We need to be careful with what we're logging to stdout, because clients
will be depending on json content in a certain format ending up there.

We log to a file called `importjs.log` in the os's default tmp
directory. Since this folder isn't super easy to discover, I'll follow
up by logging the path to the logfile on startup.
I'm starting to think that we need to support user-specific
configuration. Having the loglevel specified in importjs.json is
problematic, since you usually share this file with others. For now
though, I think it's fine to keep it here.
This will help people who want to find out what's going on behind the
scenes.
When importjs is run as a daemon, logging will be important so that
possible issues can be detected. We want people to be able to control
how much they see of this.
This bit of information should also be added to the README files of the
plugin (once they get updated to use the daemon).
By having all editor plugins pass in their process pids, we can make
sure that the importjsd process never turns into a zombie.

There's a chance that the original parent has died, but a new process
has taken over the PID of the old parent. In that case, the importjsd
process will become a zombie, even with the code included in this
change. That's a somewhat unlikely scenario though, that I don't think
we need to bother much about. According to [1], PIDs rotate (up to
32768) on linux systems so there would have to be a lot of processes
started before the old PID is reused.

Also included in this commit is changing the first log of the command to
be one line. This is important for editors so that they know when they
can start expecting JSON output on the channel.

[1]: http://stackoverflow.com/questions/11323410/linux-pid-recycling
function log(message) {
fs.appendFile('/tmp/importjsd.log', `${message}\n`, (err) => {
if (err) {
throw new Error(err);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't err an Error? Could this be if (err) throw err;?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably. This goes away later anyway.

Now that we've made some performance improvements, I want to make sure
that we don't regress too much. A benchmark test can help us better
surface how import-js performs under some circumstances.

To begin with, I'm just testing the case of importing a single word.
lookupPath: string,
validFilesRegex: string
): Promise<Array<string>> {
const tester = new RegExp(validFilesRegex, 'i');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tester is not a very good variable name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validFilesPattern?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think validFilesRegex should be validFilesPattern, and for this RegExp instance to be validFileRegex

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use `fix` a lot more than importing a single word. I want the
benchmark script to be more similar to what import-js does in the wild,
so I'm switching to `fix`. This is good because it also tests the
parallel file finding stuff that we do.
import WatchmanFileCache from './WatchmanFileCache';
import formattedToRegex from './formattedToRegex';
import normalizePath from './normalizePath';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this added here by accident?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it snuck in early. I use it later.

@lencioni
Copy link
Collaborator

👍

trotzig added 9 commits June 26, 2016 21:09
Instead of failing with something like "undefined is not a function", we
can print a friendly error message if an unknown command is being used.
This allows us to get a little better control over what files are added
to the cache. The only difference I noticed from before is that I no
longer get flashes of

ADDING file foo.js~
REMOVING file foo.js~

from the log. Which is nice.
There was some confusion around these variables, so I made it clearer
what is a compiled regexp and what is just a string pattern.

Before making the rename, I went down the path of switching to passing
in a compiled RegExp to all places. But I couldn't get
the `egrep` call inside `findMatchingFilesWithFind` to work with the
`toString()` output of RegExp. So I decided to keep the string
throughout all finders (so that it's consistent).
@lencioni pointed out in code review that we could be more clear about
when you need to know about the daemon. I added a note about it being
mostly useful for developers of editor plugins. And I changed the
wording for a few other sub sections as well.
This makes the daemon.js file a little cleaner. As suggested by
@lencioni in code review.
Even though this is the default, having the prefixes listed will make
the code less magic, and more straightforward.

https://github.com/NatLibFi/loglevel-message-prefix#configuration
Since all processes of importjsd will log to the same file, keeping logs
separated can be a little tricky. By adding the parentPid to the mix,
you can follow logs for a specific session. E.g.:

tail -f /var/folders/1l/_t6tm7195n00gn/T/importjs.log | grep PID:12345

As suggested by @lencioni:
https://github.com/Galooshi/import-js/pull/287/files/ff2474b6553f4bbf396f824021a4052fb6672742..9b6a7ad3bb1bf6eab69e153d6f88355f403a2bcd#r68489982
@trotzig
Copy link
Collaborator Author

trotzig commented Jun 30, 2016

I think I've addressed all of the comments that you had @lencioni. I'm going to go ahead and merge. My plan is to publish a new version (1.0.0) with the changes. And then I'll let that marinate for a bit before I go ahead and push new versions of plugins.

@trotzig trotzig merged commit c9ddf7a into master Jun 30, 2016
@lencioni lencioni deleted the faster-find branch July 5, 2016 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants