Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved layer grouping #1326

Merged
merged 2 commits into from
Apr 27, 2020
Merged

Improved layer grouping #1326

merged 2 commits into from
Apr 27, 2020

Conversation

jroper
Copy link
Member

@jroper jroper commented Apr 6, 2020

The main purpose of this change is to make layer grouping more robust. Instead of using a heuristic based on whether a library is in the same organization as the build (which is not always what you want, some builds may have sub projects with different organizations, and some builds may depend on external dependencies from the same organization), it checks whether the library in question is present in projectDependencyArtifacts (ie, whether its a dependency built by this build), if it is, then it goes in a higher layer.

Doing this change however required changing dockerLayerGrouping to be a TaskKey instead of a SettingKey, which is a non binary compatible change. So, dockerLayerGrouping has been deprecated, and a new TaskKey, dockerGroupLayers has been introduced.

Since a new task has been introduced, I also took advantage of that to improve the types, instead of just passing the destination path, which requires you to know how destination paths are built to check if a particular file should be in there, it passes the whole mapping, ie, the source file and the destination path, which means that you can match against the source file (which is what projectDependencyArtifacts gives you) as well. Also, I changed the type from a function that returns an Option to a PartialFunction, this makes composing it, to introduce new mappings, far simpler, as PartialFunction[_, _] composes trivially compared to Function[_, Option[_]].

The main purpose of this change is to make layer grouping more robust.
Instead of using a heuristic based on whether a library is in the same
`organization` as the build (which is not always what you want, some
builds may have sub projects with different organizations, and some
builds may depend on external dependencies from the same organization),
it checks whether the library in question is present in
`projectDependencyArtifacts` (ie, whether its a dependency built by
this build), if it is, then it goes in a higher layer.

Doing this change however required changing `dockerLayerGrouping` to be
a `TaskKey` instead of a `SettingKey`, which is a non binary compatible
change. So, `dockerLayerGrouping` has been deprecated, and a new
`TaskKey`, `dockerGroupLayers` has been introduced.

Since a new task has been introduced, I also took advantage of that to
improve the types, instead of just passing the destination path, which
requires you to know how destination paths are built to check if a
particular file should be in there, it passes the whole mapping, ie, the
source file and the destination path, which means that you can match
against the source file (which is what `projectDependencyArtifacts`
gives you) as well. Also, I changed the type from a function that
returns an `Option` to a `PartialFunction`, this makes composing it, to
introduce new mappings, far simpler, as `PartialFunction[_, _]` composes
trivially compared to `Function[_, Option[_]]`.
@jroper jroper force-pushed the improved-layer-grouping branch from ac253ae to d47d4fb Compare April 6, 2020 03:05
Copy link
Contributor

@muuki88 muuki88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jroper for improving this feature 🤗

Only a small note on documentation and then this is ready to merge 😄

@@ -1 +1 @@
dockerLayerGrouping in Docker := (_ => None)
dockerGroupLayers in Docker := PartialFunction.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muuki88
Copy link
Contributor

muuki88 commented Apr 6, 2020

@ppiotrow any remarks from your side?

@muuki88 muuki88 added the docker label Apr 6, 2020
val oldPartialFunction = Function.unlift((tuple: (File, String)) => oldFunction(tuple._2))

val libDir = dockerBaseDirectory + "/lib/"
val binDir = dockerBaseDirectory + "/bin/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also add "/conf/" to the same layer? I noticed that /conf/application.ini gets created when javaOptions are used.

      javaOptions in Universal := Seq(
      "-Djava.net.preferIPv4Stack=true",
      "-Dlogback.configurationFile=logback.packaged.xml"
      ),

val binDir = dockerBaseDirectory + "/bin/"

oldPartialFunction.orElse {
case (file, _) if artifacts(file) => 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me artifacts.contains would be more explicit.

val dockerLayerGrouping = settingKey[String => Option[Int]](
"Group files by path into in layers to increase docker cache hits. " +
"Lower index means the file would be a part of an earlier layer."
)
val dockerGroupLayers = taskKey[PartialFunction[(File, String), Int]](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very convinced to expose File as this function argument.

  1. Not sure how plugin users will do matching on file, but most likely they end up comparing location/path
  2. It will simplify chmod related code that will give up find.getOrElse and /dev/null and become simply layerToPath.lift(...)
  3. To make it work, just simple change is necessary: sth like .map(_.data.path) in body of dockerGroupLayers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how plugin users will do matching on file, but most likely they end up comparing location/path

I think there are use cases for either. With the path, the user has to know exactly the structure of the Docker image, including what that base directory is. The file I think makes more sense because it's the file that's changing, not the path. I know that this source file changes frequently, so every mapping that includes this source file, I want to put it in a higher layer. That's more direct and robust then having to work out what the destination path for that file is, and then matching based on that.

It will simplify chmod related code that will give up find.getOrElse and /dev/null and become simply layerToPath.lift(...)

Yes, but that code isn't that complex, and I think the advantages outweigh the minuses here.

To make it work, just simple change is necessary: sth like .map(_.data.path) in body of dockerGroupLayers

No, that's the point, that doesn't work, and that's why the File is being passed. If I want to match the slf4j api jar, the path for that is /home/jroper/.ivy2/cache/org.slf4j/slf4j-api/jars/slf4j-api-1.7.30.jar, meanwhile, the path that gets passed into the function is opt/docker/lib/org.slf4j.slf4j-api-1.7.30.jar. They are completely different paths, even the file name doesn't match. So, in order to match based on the path, I need to know the completely undocumented method that sbt native packager uses to construct that file name, I also need to know the completely undocumented method that sbt native packager uses to construct the the path to it. What if sbt native packager ever changes that logic? Then my logic for matching will break. In contrast, this logic will never break, no matter what sbt native packager does:

dockerGroupLayers := {
  val slf4jFile = for {
    artifact <- externalDependencyClasspath.value
    moduleId <- artifact.metadata.get(AttributeKey[ModuleID]("moduleID"))
    if moduleId.organization == "org.slf4j" && moduleId.name == "slf4j-api"
  } yield artifact.data
  dockerGroupLayers.value.orElse {
    case (`slf4jFile`, _) => Some(2)
  }
}

Perhaps a bit wordy, but if we find it's popular, we could always add a helper function to sbt and then they could use:

dockerGroupLayers := {
  val slf4jFile = getClasspathFileFor(externalDependencyClasspath.value, "org.slf4j", "slf4j-api")
  dockerGroupLayers.value.orElse {
    case (`slf4jFile`, _) => Some(2)
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dockerGroupLayers := {
  val slf4jFile = for {
    artifact <- externalDependencyClasspath.value
    moduleId <- artifact.metadata.get(AttributeKey[ModuleID]("moduleID"))
    if moduleId.organization == "org.slf4j" && moduleId.name == "slf4j-api"
  } yield artifact.data
  dockerGroupLayers.value.orElse {
    case (`slf4jFile`, _) => Some(2)
  }
}

maybe we can add this to the docker.rst docs as well?

@muuki88
Copy link
Contributor

muuki88 commented Apr 27, 2020

@ppiotrow I'm going to merge this and we address the small hints in another pull request.

Do you have time for this 😄 ? If not, that's totally fine. I'm going to add the two notes.

@muuki88 muuki88 merged commit eef9bd7 into sbt:master Apr 27, 2020
@ppiotrow
Copy link
Contributor

Sure, I have plans to contribute more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants