Skip to content

Commit

Permalink
Merge pull request #83 from d40cht/master
Browse files Browse the repository at this point in the history
Caching the contents of jars on disk from run-to-run
  • Loading branch information
eed3si9n committed May 23, 2013
2 parents 72587ac + 2b6fb10 commit 647381a
Show file tree
Hide file tree
Showing 19 changed files with 204 additions and 53 deletions.
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,16 +209,22 @@ from using the `sourceOfFileForMerge` method on `sbtassembly.AssemblyUtils`,
which takes the temporary directory and one of the files passed into the
strategy as parameters.

Cached Output
-------------
Caching
-------

By default for performance reasons, the result of unzipping any dependency jars to disk is cached from run-to-run. This feature can be disabled by setting

```scala
assemblyCacheUnzip in assembly := false
```

If you wish to cache the fat JAR so its timestamp changes only when the input changes, set the following setting:

```scala
assemblyCacheOutput in assembly := true
```

Currently this feature requires checking the SHA-1 hash of all *.class including that of dependencies, and thus it could take [very long time](https://github.com/sbt/sbt-assembly/issues/68)!
This feature requires checking the SHA-1 hash of all *.class files, and the hash of all dependency *.jar files. If there are a large number of class files, this could take a long time, although with hashing of jar files, rather than their contents, the speed has recently been [improved](https://github.com/sbt/sbt-assembly/issues/68).

Publishing
----------
Expand Down
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ name := "sbt-assembly"

organization := "com.eed3si9n"

version := "0.8.8"
version := "0.9.0-SNAPSHOT"

CrossBuilding.crossSbtVersions := Seq("0.11.3", "0.11.2" ,"0.12")

Expand Down
18 changes: 18 additions & 0 deletions notes/0.9.0.markdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
### Speed improvements: Additional caching option, new cacheOutput behaviour

* Assembly unzips each dependency jar before re-assembling the contents of each jar (and the build classfiles) into a single jar. Previously this happened again for each assembly run, but now the result of the unzip is cached from run-to-run. This provides good performance improvements, especially for systems with slower IO. The new boolean option `cacheUnzip in assembly` (default `true`) can be set to false to disable this behaviour.
* In addition, the behaviour of `assemblyCacheOutput` has been modified. Previously, the contents of each dependency jar would be extracted and then each contained file would be sha1 hashed to determine whether the assembly jar needed rebuilding. Now the jar itself is hashed (significantly quicker) to determine whether assembly needs to be re-run.
* In combination, these two options can reduce the run-time of the second invokation of assembly (when nothing has changed) by up to 5 times on sample test projects.

Example run-times for a large-ish build, with multiple (eleven) projects with assembly target are below. The commands run for each build were: `;clean;package` (untimed) and then three runs of `assembly`:

## 0.9.0
Run number | No caching | Cache unzip | Cache unzip + cache output
---- | --- | --- | ---
1st | 99s | 101s | 85s
2nd | 119s | 34s | 12s
3rd | 102s | 32s | 12s

[83]: https://github.com/sbt/sbt-assembly/pull/83


121 changes: 81 additions & 40 deletions src/main/scala/sbtassembly/Plugin.scala
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,17 @@ import java.security.MessageDigest

object Plugin extends sbt.Plugin {
import AssemblyKeys._

// Keep track of the source package of mappings that come from a jar, so we can
// sha1 the jar instead of the unpacked packages when determining whether to rebuild
case class MappingSet( sourcePackage : Option[File], mappings : Seq[(File, String)] )
{
def dependencyFiles = sourcePackage match
{
case Some(f) => Seq(f)
case None => mappings.map(_._1)
}
}

object AssemblyKeys {
lazy val assembly = TaskKey[File]("assembly", "Builds a single-file deployable jar.")
Expand All @@ -23,10 +34,12 @@ object Plugin extends sbt.Plugin {
lazy val outputPath = TaskKey[File]("assembly-output-path")
lazy val excludedFiles = SettingKey[Seq[File] => Seq[File]]("assembly-excluded-files")
lazy val excludedJars = TaskKey[Classpath]("assembly-excluded-jars")
lazy val assembledMappings = TaskKey[File => Seq[(File, String)]]("assembly-assembled-mappings")
lazy val assembledMappings = TaskKey[File => Seq[MappingSet]]("assembly-assembled-mappings")
lazy val mergeStrategy = SettingKey[String => MergeStrategy]("assembly-merge-strategy", "mapping from archive member path to merge strategy")
lazy val assemblyDirectory = SettingKey[File]("assembly-directory")
lazy val assemblyCacheOutput = SettingKey[Boolean]("assembly-cache-output")

lazy val assemblyCacheOutput = SettingKey[Boolean]("assembly-cache-output")
lazy val assemblyCacheUnzip = SettingKey[Boolean]("assembly-cache-unzip", "Cache the results of unzipping dependency jars from run to run")
}

/**
Expand Down Expand Up @@ -139,44 +152,48 @@ object Plugin extends sbt.Plugin {
}
}

private def assemblyTask(out: File, po: Seq[PackageOption], mappings: File => Seq[(File, String)],
strats: String => MergeStrategy, tempDir: File, cacheOutput: Boolean, cacheDir: File, log: Logger): File =
Assembly(out, po, mappings, strats, tempDir, cacheOutput, cacheDir, log)
private def assemblyTask(out: File, po: Seq[PackageOption], mappings: File => Seq[MappingSet],
strats: String => MergeStrategy, tempDir: File, cacheOutput: Boolean, cacheDir: File, cacheUnzip: Boolean, log: Logger): File =
Assembly(out, po, mappings, strats, tempDir, cacheOutput, cacheDir, cacheUnzip, log)

object Assembly {
def apply(out: File, po: Seq[PackageOption], mappings: File => Seq[(File, String)],
strats: String => MergeStrategy, tempDir: File, cacheOutput: Boolean, cacheDir: File, log: Logger): File = {
def apply(out: File, po: Seq[PackageOption], mappings: File => Seq[MappingSet],
strats: String => MergeStrategy, tempDir: File, cacheOutput: Boolean, cacheDir: File, cacheUnzip: Boolean, log: Logger): File = {
import Tracked.{inputChanged, outputChanged}
import Types.:+:
import Cache._
import FileInfo.{hash, exists}

IO.delete(tempDir)
tempDir.mkdir()
val ms = applyStrategies(mappings(tempDir), strats, tempDir, log)
if ( !cacheUnzip ) IO.delete( tempDir )
if ( !tempDir.exists ) tempDir.mkdir()

val mappingSets = mappings(tempDir)
val ms : Seq[(File, String)] = applyStrategies(mappingSets, strats, tempDir, log)
def makeJar {
val config = new Package.Configuration(ms, out, po)
Package(config, cacheDir, log)
}
val cachedMakeJar = inputChanged(cacheDir / "assembly-inputs") { (inChanged, inputs: Seq[Byte]) =>
outputChanged(cacheDir / "assembly-outputs") { (outChanged, jar: PlainFileInfo) =>
if (inChanged) {
log.info("SHA-1: " + inputs)
log.info("SHA-1: " + inputs.map( b => "%02x".format(b) ).mkString)
} // if
if (inChanged || outChanged) makeJar
else log.info("Assembly up to date: " + jar.file)
}
}
lazy val inputs = sha1.digest((ms map {_._1} map {hash.apply}).toString.getBytes("UTF-8")).toSeq

lazy val inputs = sha1.digest((mappingSets flatMap { _.dependencyFiles } map {hash.apply}).toString.getBytes("UTF-8")).toSeq
if (cacheOutput) {
log.info("Checking every *.class file's SHA-1. This could take very long time.")
log.info("Checking every *.class/*.jar file's SHA-1.")
cachedMakeJar(inputs)(() => exists(out))
}
else makeJar
out
}
def applyStrategies(srcs: Seq[(File, String)], strats: String => MergeStrategy,
def applyStrategies(srcSets: Seq[MappingSet], strats: String => MergeStrategy,
tempDir: File, log: Logger): Seq[(File, String)] = {
val srcs = srcSets.flatMap( _.mappings )
val counts = scala.collection.mutable.Map[MergeStrategy, Int]().withDefaultValue(0)
def applyStrategy(strategy: MergeStrategy, name: String, files: Seq[(File, String)]): Seq[(File, String)] = {
if (files.size >= strategy.notifyThreshold) {
Expand Down Expand Up @@ -223,7 +240,7 @@ object Plugin extends sbt.Plugin {
// even though fullClasspath includes deps, dependencyClasspath is needed to figure out
// which jars exactly belong to the deps for packageDependency option.
private def assemblyAssembledMappings(tempDir: File, classpath: Classpath, dependencies: Classpath,
ao: AssemblyOption, ej: Classpath, log: Logger) = {
ao: AssemblyOption, ej: Classpath, cacheUnzip: Boolean, log: Logger) = {
import sbt.classpath.ClasspathUtilities

val (libs, dirs) = classpath.map(_.data).sorted.partition(ClasspathUtilities.isArchive)
Expand Down Expand Up @@ -263,20 +280,42 @@ object Plugin extends sbt.Plugin {

val jarDirs = for(jar <- libsFiltered.par) yield {
val jarName = jar.asFile.getName
log.info("Including %s".format(jarName))
val hash = sha1name(jar)
IO.write(tempDir / (hash + ".jarName"), jar.getCanonicalPath, IO.utf8, false)

val hash = sha1name(jar) + sha1content(jar)
val jarNamePath = tempDir / (hash + ".jarName")
val dest = tempDir / hash
dest.mkdir()
IO.unzip(jar, dest)
IO.delete(ao.exclude(Seq(dest)))
dest

// If the jar name path does not exist, or is not for this jar, unzip the jar
if ( !cacheUnzip || !jarNamePath.exists || IO.read(jarNamePath) != jar.getCanonicalPath )
{
log.info("Including: %s".format(jarName))
IO.delete(dest)
dest.mkdir()
IO.unzip(jar, dest)
IO.delete(ao.exclude(Seq(dest)))

// Write the jarNamePath at the end to minimise the chance of having a
// corrupt cache if the user aborts the build midway through
IO.write(jarNamePath, jar.getCanonicalPath, IO.utf8, false)
}
else log.info("Including from cache: %s".format(jarName))


(dest, jar)
}

val base = dirsFiltered ++ jarDirs
val descendants = ((base ** "*") --- ao.exclude(base) --- base).get filter { _.exists }

val base : Seq[File] = dirsFiltered ++ jarDirs.map( _._1 )

descendants x relativeTo(base)
def getMappings( rootDir : File ) =
{
val descendendants = ((rootDir ** "*") --- ao.exclude(base) --- base).get filter { _.exists }

descendendants x relativeTo(base)
}

dirsFiltered.map( d => MappingSet( None, getMappings(d) ) ) ++ jarDirs.map { case (d, j) => MappingSet( Some(j), getMappings(d) ) }

}

private val LicenseFile = """(license|licence|notice|copying)([.]\w+)?$""".r
Expand Down Expand Up @@ -327,37 +366,39 @@ object Plugin extends sbt.Plugin {
lazy val baseAssemblySettings: Seq[sbt.Project.Setting[_]] = Seq(
assembly <<= (test in assembly, outputPath in assembly, packageOptions in assembly,
assembledMappings in assembly, mergeStrategy in assembly,
assemblyDirectory in assembly, assemblyCacheOutput in assembly, cacheDirectory, streams) map {
(test, out, po, am, ms, tempDir, co, cacheDir, s) =>
assemblyTask(out, po, am, ms, tempDir, co, cacheDir, s.log) },
assemblyDirectory in assembly, assemblyCacheOutput in assembly, cacheDirectory, assemblyCacheUnzip in assembly, streams) map {
(test, out, po, am, ms, tempDir, co, cacheDir, acu, s) =>
assemblyTask(out, po, am, ms, tempDir, co, cacheDir, acu, s.log) },

assemblyCacheOutput in assembly := false,

assemblyCacheUnzip in assembly := true,

assembledMappings in assembly <<= (assemblyOption in assembly, fullClasspath in assembly, dependencyClasspath in assembly,
excludedJars in assembly, streams) map {
(ao, cp, deps, ej, s) => (tempDir: File) => assemblyAssembledMappings(tempDir, cp, deps, ao, ej, s.log) },
excludedJars in assembly, assemblyCacheUnzip in assembly, streams) map {
(ao, cp, deps, ej, acu, s) => (tempDir: File) => assemblyAssembledMappings(tempDir, cp, deps, ao, ej, acu, s.log) },

mergeStrategy in assembly := defaultMergeStrategy,

packageScala <<= (outputPath in packageScala, packageOptions,
assembledMappings in packageScala, mergeStrategy in assembly,
assemblyDirectory in assembly, assemblyCacheOutput in assembly, cacheDirectory, streams) map {
(out, po, am, ms, tempDir, co, cacheDir, s) => assemblyTask(out, po, am, ms, tempDir, co, cacheDir, s.log) },
assemblyDirectory in assembly, assemblyCacheOutput in assembly, cacheDirectory, assemblyCacheUnzip in assembly, streams) map {
(out, po, am, ms, tempDir, co, cacheDir, acu, s) => assemblyTask(out, po, am, ms, tempDir, co, cacheDir, acu, s.log) },

assembledMappings in packageScala <<= (assemblyOption in packageScala, fullClasspath in assembly, dependencyClasspath in assembly,
excludedJars in assembly, streams) map {
(ao, cp, deps, ej, s) => (tempDir: File) =>
assemblyAssembledMappings(tempDir, cp, deps, ao, ej, s.log) },
excludedJars in assembly, assemblyCacheUnzip in assembly, streams) map {
(ao, cp, deps, ej, acu, s) => (tempDir: File) =>
assemblyAssembledMappings(tempDir, cp, deps, ao, ej, acu, s.log) },

packageDependency <<= (outputPath in packageDependency, packageOptions in assembly,
assembledMappings in packageDependency, mergeStrategy in assembly,
assemblyDirectory in assembly, assemblyCacheOutput in assembly, cacheDirectory, streams) map {
(out, po, am, ms, tempDir, co, cacheDir, s) => assemblyTask(out, po, am, ms, tempDir, co, cacheDir, s.log) },
assemblyDirectory in assembly, assemblyCacheOutput in assembly, cacheDirectory, assemblyCacheUnzip in assembly, streams) map {
(out, po, am, ms, tempDir, co, cacheDir, acu, s) => assemblyTask(out, po, am, ms, tempDir, co, cacheDir, acu, s.log) },

assembledMappings in packageDependency <<= (assemblyOption in packageDependency, fullClasspath in assembly, dependencyClasspath in assembly,
excludedJars in assembly, streams) map {
(ao, cp, deps, ej, s) => (tempDir: File) =>
assemblyAssembledMappings(tempDir, cp, deps, ao, ej, s.log) },
excludedJars in assembly, assemblyCacheUnzip in assembly, streams) map {
(ao, cp, deps, ej, acu, s) => (tempDir: File) =>
assemblyAssembledMappings(tempDir, cp, deps, ao, ej, acu, s.log) },

test <<= test or (test in Test),
test in assembly <<= (test in Test),
Expand Down
56 changes: 56 additions & 0 deletions src/sbt-test/sbt-assembly/caching/build.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import AssemblyKeys._

version := "0.1"

assemblySettings

scalaVersion := "2.9.1"

libraryDependencies += "org.scalatest" % "scalatest_2.9.0" % "1.6.1" % "test"

libraryDependencies += "com.weiglewilczek.slf4s" %% "slf4s" % "1.0.7"

libraryDependencies += "ch.qos.logback" % "logback-classic" % "0.9.29" % "runtime"

assemblyCacheUnzip in assembly := true

assemblyCacheOutput in assembly := false

unmanagedJars in Compile <++= baseDirectory map { base =>
(base / "lib" / "compile" ** "*.jar").classpath
}

unmanagedJars in Runtime <++= baseDirectory map { base =>
(base / "lib" / "runtime" ** "*.jar").classpath
}

unmanagedJars in Test <++= baseDirectory map { base =>
(base / "lib" / "test" ** "*.jar").classpath
}

excludedJars in assembly <<= (fullClasspath in assembly) map { cp =>
cp filter {_.data.getName == "compile-0.1.0.jar"}
}

jarName in assembly := "foo.jar"

TaskKey[Unit]("check") <<= (crossTarget) map { (crossTarget) =>
val process = sbt.Process("java", Seq("-jar", (crossTarget / "foo.jar").toString))
val out = (process!!)
if (out.trim != "hello") error("unexpected output: " + out)
()
}

TaskKey[Unit]("checkhash") <<= (crossTarget, streams) map { (crossTarget, s) =>
import java.security.MessageDigest
val jarHash = crossTarget / "jarHash.txt"
val hash = MessageDigest.getInstance("SHA-1").digest(IO.readBytes(crossTarget / "foo.jar")).map( b => "%02x".format(b) ).mkString
if ( jarHash.exists )
{
val prevHash = IO.read(jarHash)
s.log.info( "Checking hash: " + hash + ", " + prevHash )
assert( hash == prevHash )
}
IO.write( jarHash, hash )
()
}
1 change: 1 addition & 0 deletions src/sbt-test/sbt-assembly/caching/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0-+")
3 changes: 3 additions & 0 deletions src/sbt-test/sbt-assembly/caching/src/main/scala/hello.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
object Main {
def main(args: Array[String]) { println("hello") }
}
20 changes: 20 additions & 0 deletions src/sbt-test/sbt-assembly/caching/test
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# check if the file gets created
> clean
> assembly
$ exists target/scala-2.9.1/foo.jar

# run to cache the hash, then check it's consistent
> check
> checkhash
$ exists target/scala-2.9.1/jarHash.txt
> checkhash

> assembly
$ newer target/scala-2.9.1/foo.jar target/scala-2.9.1/jarHash.txt
> check

# run again to check that the hash is the same
# when the unzipped jars are read from cache
# on disk
> checkhash
$ newer target/scala-2.9.1/jarHash.txt target/scala-2.9.1/foo.jar
2 changes: 2 additions & 0 deletions src/sbt-test/sbt-assembly/config/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ import AssemblyKeys._

version := "0.1"

scalaVersion := "2.9.1"

inConfig(Test)(baseAssemblySettings)

jarName in (Test, assembly) := "foo.jar"
Expand Down
2 changes: 1 addition & 1 deletion src/sbt-test/sbt-assembly/config/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8-+")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0-+")
2 changes: 1 addition & 1 deletion src/sbt-test/sbt-assembly/config/test
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# check if the file gets created
> test:assembly
$ exists target/foo.jar
$ exists target/scala-2.9.1/foo.jar

# check if it says hello
> check
2 changes: 1 addition & 1 deletion src/sbt-test/sbt-assembly/deps/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8-+")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0-+")
2 changes: 1 addition & 1 deletion src/sbt-test/sbt-assembly/empty/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8-+")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0-+")
2 changes: 1 addition & 1 deletion src/sbt-test/sbt-assembly/mergefail/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8-+")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0-+")
2 changes: 1 addition & 1 deletion src/sbt-test/sbt-assembly/mergefail2/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8-+")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0-+")
2 changes: 1 addition & 1 deletion src/sbt-test/sbt-assembly/merging/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8-+")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0-+")
4 changes: 4 additions & 0 deletions src/sbt-test/sbt-assembly/simple/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ version := "0.1"

assemblySettings

assemblyCacheOutput in assembly := true

assemblyCacheUnzip in assembly := true

jarName in assembly := "foo.jar"

TaskKey[Unit]("check") <<= (crossTarget) map { (crossTarget) =>
Expand Down
2 changes: 1 addition & 1 deletion src/sbt-test/sbt-assembly/simple/project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8-+")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0-+")
Loading

0 comments on commit 647381a

Please sign in to comment.