-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching the contents of jars on disk from run-to-run #83
Conversation
…icant speed improvement.
@@ -151,8 +151,7 @@ object Plugin extends sbt.Plugin { | |||
import Cache._ | |||
import FileInfo.{hash, exists} | |||
|
|||
IO.delete(tempDir) | |||
tempDir.mkdir() | |||
if ( !tempDir.exists ) tempDir.mkdir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if someone keeps adding/removing library deps, it'll accumulate here, but it won't be included in the final assembly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Should I implement a garbage collect at the end, keeping track of referenced directories and discarding the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as it doesn't end up in the final assembly, I think it's ok.
Adding a conditional key would be prudent in case someone wants to opt-out. But by default it should be enabled. What do you think about lazy val cacheUnzip = SettingKey[Boolean]("assembly-cache-unzip") |
cacheUnzip sounds good to me. I'll add it in. |
Also could you start |
By the way: how do I run/add tests? I'm not very familiar with scripted. |
I usually read my own tutorial: http://eed3si9n.com/testing-sbt-plugins You probably have to change the version to 0.9.0-SNAPSHOT for the plugin, and manually change all the plugins.sbt under test. |
Also: if cachedMakeJar were to sha1 each jar (where applicable) rather each individual contained file, presumably that would speed up assemblyCacheOutput somewhat. Is it worth me plumbing that through too? |
Totally yes. |
…e mappings to keep track of their parent jars where applicable. Compute the sha1 hash of the jar rather than extracted class files when deciding whether to rebuild when cacheOutput is set. Significant speedups.
Would it be worth putting sbt-assembly under autobuild (for scripted tests) via Travis CI? |
github doesn' send me a notification when a commit gets added to a pull req, so I didn't see the commit 4h ago. |
@@ -4,7 +4,7 @@ name := "sbt-assembly" | |||
|
|||
organization := "com.eed3si9n" | |||
|
|||
version := "0.8.8" | |||
version := "0.8.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During development SNAPSHOT version should be used.
Some timings for a largish build on a machine with reasonable speed local disks (having already run assembly once): 0.8.8 with cacheOutput: ~60s 0.9.0 with cacheUnzip, without cacheOutput: ~33s |
What's 0.8.8 without cacheOutput? That's the baseline current users have. |
Let me know if you're still working on this pull req or if it's ready to be merged. |
Still working on it. Here are a load of runtimes as requested. Two sets, both on the same build, the first for a single project, the second for all the projects within the build that have an assembly target (11 of them). I'm including both sets because the numbers for running multiple assemblies at once are somewhat unexpected for the first assembly on 0.8.8. Here we go, and bear in mind that I've run each of these several times and there is about a +/- 10% variation in runtimes from build to build. So I'm assuming that if the numbers are within 10% of each other, they're effectively the same. Single project0.8.8
0.9.0
Multiple projects, 11 with assembly target, sharing a lot of dependency jars0.8.8
0.9.0
I've not investigated the 0.8.8 multi-project results in enormous depth, but I have checked them and checked them again. I don't really have any idea why it's faster to run the first time with cacheoutput enabled - clearly it's doing more work, but I presume the IO is somehow being scheduled more efficiently as this assembly is pretty IO-bound. Any thoughts? Unfortunately the project concerned is my non-open work one - so I'm not able to share it for wider investigation. As far as finishing the PR - I'll write some notes as requested, and add a test for the caching behaviour. And then it'll be ready - subject to your approval of course. |
Interesting stuff. No rush on finishing up the pull req. I just wanted to make sure I'm not holding anything up. |
…nsure that a jar built before and after caching has the same content hash.
I've pushed up another commit including a test for the caching and some changes to notes and README.md. Hopefully that'll be the last apart from any problems you spot. I've not enabled cacheOutput by default - I thought I'd let you make your own judgement (possibly including your own performance checks) and enable it if you'd like. But I guess this PR addresses #68 too? |
Caching the contents of jars on disk from run-to-run
Merged. Thanks for your contribution! |
Thanks sir. Out of interest - when you would be planning to publish this? |
Probably tonight or tomorrow, unless there are more things to work on. |
Cool. And thanks for all the help. |
Here is a basic implementation of on-disk jar content caching from run-to-run. I've added the sha1 hash of the jar itself into the name of the unzip contents directory and the .jarName file. Also the .jarName file is created last (only after a successful unzip) and it is the existence of this file that is used to check whether to unzip or use a cache. So under most circumstances the assembly should be good even if the user aborts and restarts the build midway through.
Is this OK as-is, or would you like the caching to be conditional on a key? This change results in a very significant speedup on my setup.