Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: use checkstyle core cachefile #488

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Bananeweizen
Copy link
Collaborator

This is a proof of concept only and needs further discussion.
When running checkstyle via maven plugin, it's really easy to use the cacheFile mechanism for more speed. In the eclipse plugin, there is no caching enabled for checkstyle core (the plugin only caches some of its own configuration files and similar). This leads to very long runtime in case of clean builds (because all files will be scanned again then, independent of whether there are actual changes). I have some workspace where checkstyle consumes 20 minutes of CPU time on clean builds.
This change enables the cacheFile mechanism in checkstyle core. It uses a single cachefile for all checkers, which is stored in the ".metadata" folder belonging to a workspace. That way the cachefile is not visible in the workspace itself, and every workspace has its own persistent cachefile, without using any global temp directories or similar.
The cache is not configurable and always enabled. It can be invalidated with the already existing "Purge checkstyle caches" action (hit Ctrl-3, start typing "purge").
The effect of the cache can be seen best when using the context menu on a large project to invoke checkstyle on it. Without the cache that always takes the same time, with a visible job in the progress view. With the cache the first invocation takes that same long time, every additional invocation on an already checked project is super-fast (in fact, I don't see a job anymore in the progress view in most cases because the job is finished more quickly then the progress view updates).

Open issues:

  • The existing lifecycle of the cachefile in checkers doesn't really fit for the plugin, therefore it currently calls the persist method via reflection. This method should be made public API in the checkstyle project. Right now it's only called from within the Checker.destroy() method, and that is not called at all from the Eclipse plugin.
  • Are there any hidden assumptions about the cachefile on the checkstyle core side? Is it okay to have the same checkfile for many checkers? The plugin creates one checker per eclipse project to be checked, and all share the identical cachefile right now.

@rnveach
Copy link
Member

rnveach commented Apr 3, 2023

One issue you will run into is #465 .

Eclipse-CS' current caching method is incomplete and causes issues. Adding a new cache on top of it will make figuring out caching issues harder to track.

I am all for more caching, but you may want to consider fixing the current caching before adding more to it.

@rnveach
Copy link
Member

rnveach commented Apr 3, 2023

The existing lifecycle of the cachefile in checkers doesn't really fit for the plugin, therefore it currently calls the persist method via reflection.

IMO, I feel Eclipse-CS is not really using Checkstyle's lifecycle correctly. I feel this needs to be a bigger conversation beteween Eclipse-CS and CS.

Checkstyle was not meant to be living long in memory. It was meant for a short, one use like running it through the command line. If you need another run, as everything is currently built, you are suppose to re-create Checker and reload everything from scratch for the next run. This is basically the core issue of #465 . If you are keeping CS in use longer, then you are almost on your own.

I do believe CS should change, but like I said, I think this should a bigger conversation. I think CS should open some things up and Eclipse-CS should be using a custom Checker.

Is it okay to have the same checkfile for many checkers?

The hash of the configuration is stored into the cache file. This is why #465 needs the external resource list of the configuration. This way when the configuration or its resources are changed, the cache will be reset as you could be getting new/removed violations with different checks added. It is assumed 1 cache file represents one configuration.

If you attach multiple configurations to one cache, it will see the config hash as being different on each run and always reset the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants