Document internals of invalidation and hook profiler #585

jvican · 2018-08-26T12:37:55Z

While I was trying to implement some improvements here, I realize that
the current code can hardly be read and reused. I struggled as well to
understand the nitty-gritty of the algorithm as the control flow was not
clear and the order of functions was confusing.

This PR aims to fix this by reorganising the code to allow for more
reuse (functions in IncrementalCommon's companion), reorganizes the
outline of IncrementalCommon to have clearer names and structure and
documents the most important parts of the code with implementation and
contract details. Aside from that, it introduces a protobuf data to
track all the invalidation logic. This protobuf is internal to Zinc
and should only be used by users that really know what they are doing --
as the format can change in the future (protobuf makes it easy to evolve
this profiling schema). This is related to the idea I pitched in in #550.

With this idea, this data will not only be useful for debugging but for
providing an automatic way of reporting bugs in Zinc. The infrastructure
is far from finished but it's already in a usable state for libraries
that depend on Zinc directly and have direct access to Incremental.
For performance reasons, a string table is kept per zinc profiler. More
details in the scaladoc.

Note that by default no profiler is used. Only people that change the
profiler argument for Incremental.compile will be able to get the run
profiles. How these profiles are persisted is out of scope of this PR.

And inline the actual used function in the companion of `IncrementalCommon`, with a better and simpler implementation.

While I was trying to implement some improvements here, I realize that the current code can hardly be read and reused. I struggled as well to understand the nitty-gritty of the algorithm as the control flow was not clear and the order of functions was confusing. This commit aims to fix this by reorganising the code to allow for more reuse (functions in `IncrementalCommon`'s companion), reorganizes the outline of `IncrementalCommon` to have clearer names and structure and documents the most important parts of the code with implementation and contract details.

jvican · 2018-08-26T12:44:38Z

I'll fix the MiMa errors later on, but this PR is ready for review.

zprof is the name I've chosen for this small profiler (or tracker if you will) of the invalidation logic. The profiled data is formalized in an internal format that is not supposed to be used by normal users, but rather by us (Zinc) and related tools (Bloop). The current profiled data exposes details of how the incremental compiler works internally and how it invalidates classes. This is the realization of an idea I registered here: sbt#550 With this idea, this data will not only be useful for debugging but for providing an automatic way of reporting bugs in Zinc. The infrastructure is far from finished but it's already in a usable state for libraries that depend on Zinc directly and have direct access to `Incremental`. By default, no profiler is used. Only people that change the profiler argument for `Incremental.compile` will be able to get the run profiles.

These methods are not exposed to the public API of Zinc and can therefore be changed with certainty.

jvican · 2018-08-26T18:14:16Z

MiMa errors are fixed, CI passing.

muuki88

Thanks for all the comments, making things more understandable for a zinc newcomer 👍 😍
But I want more 😂

The general implementation seems reasonable from the understanding I have so far (which can be described as almost nothing 😛 ). The only question I have is why the toProfile method is not part of the public API. It seems to me that this is the method to get the profile data in a serializable format

muuki88 · 2018-08-27T07:08:18Z

internal/zinc-core/src/main/protobuf/zprof.proto

@@ -0,0 +1,68 @@
+syntax = "proto3";


It would be nice to add more documentation on the messages and the zprof.proto proto file 😃
Especially answering questions like

why is this message necessary?

who/what might actually use this message to solve which problem

a remake to this pull request which marks the initial implementation

muuki88 · 2018-08-27T07:23:06Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+}
+
+object InvalidationProfiler {
+  final val empty: InvalidationProfiler = new InvalidationProfiler {


From the implementation I guess this is being used when no profiling should be done?

muuki88 · 2018-08-27T07:27:09Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+
+  def profileRun: RunProfiler = new ZincProfilerImplementation
+
+  private final var runs: List[zprof.ZincRun] = Nil


Can you add some information why we need to persist every run here?

Also this list grows with every compile run if I understand this correctly?
AFAIK it's only used for serialization to disc. Should we empty the runs after
serialization? Or is the memory footprint small enough that we should never care

The ZincProfiler is managed outside of Zinc, by another tool. Whomever uses this profiler is responsible for managing the lifetime of the profilers used. That being said, the memory profile of the runs is very small given that we use a string table

muuki88 · 2018-08-27T07:27:24Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+   * It is recommended to only perform this operation when we are
+   * going to persist the profiled protobuf data to disk. Do not
+   * call this function after every compiler iteration as the aggregation
+   * of the symbol tables may be expensive, it's recommended to


Why is this recommended? 😄

Because the string table for very big projects could be reasonable big, so you want to make the most out of it for as many runs as you can

muuki88 · 2018-08-27T07:29:26Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+  private final var lastKnownIndex: Long = -1L
+  private final val stringTable1: ArrayBuffer[String] = new ArrayBuffer[String](1000)
+  private final val stringTable2: ArrayBuffer[String] = new ArrayBuffer[String](10)
+  private final val stringTableIndices: mutable.HashMap[String, Long] =


Can you add some information what these indices are for? String / Long are very generic types and people like me with very few compiler experience would find one sentence explaining the variable quite helpful 😄

What would you write? I didn't write anything because i thought the variable names were self-descriptive, i.e. stringTableIndices means a map from string to indices of the string table.

what I don't understand (yet) is what these strings are. Class names, variable names, file paths or all of this 😁 And long is the index of what.

If you say it's self explaining once one is deeper into how things work, I'm fine with not putting any docs on it 😉

No, I think you make a good point. It's likely that I'm too much into the implementation that I see it obvious, but it isn't. Thanks for pointing that out, I'll add a quick comment.

muuki88 · 2018-08-27T07:31:57Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+  }
+}
+
+abstract class RunProfiler {


Same here 😉 I find some scaladocs on a fresh API usually very helpful to get
the intention and purpose of the API. The question why is this interface necessary
should be answered in the scaladocs 😃

muuki88 · 2018-08-27T07:32:54Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+  ): Unit
+
+  def registerEvent(
+      kind: String,


What kinds are allowed here? Is there any other implementation I can look at to see what
is applicable here?

It's in generic on purpose so that we can easily involve it in the future, even though we only have one implementation. The kinds are defined in https://github.com/sbt/zinc/pull/585/files#diff-de86f02542b47ffddbb8687a94316daeR259

muuki88 · 2018-08-27T07:34:36Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+      )
+    }
+
+    private final var currentEvents: List[zprof.InvalidationEvent] = Nil


Same as with runs. This grows unbounded, right?

What do you mean by this grows unbounded? In every run, we create a new RunProfiler so that new profile wont' see events from the past

Ah, thanks 👍 Misunderstood that.

muuki88 · 2018-08-27T07:34:42Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+      currentEvents = event :: currentEvents
+    }
+
+    private final var cycles: List[zprof.CycleInvalidation] = Nil


Same as with runs. This grows unbounded, right?

muuki88 · 2018-08-27T07:37:47Z

internal/zinc-core/src/main/scala/sbt/internal/inc/InvalidationProfiler.scala

+   *
+   * @return An immutable zprof profile that can be persisted via protobuf.
+   */
+  def toProfile: zprof.Profile = zprof.Profile(


This is the most important method in the whole implementation, right?
It creates a serializable format for the profile data gathered so far. Why isn't this part
of the public API?

Because this exposes internal details of the algorithm, and I only want people that really know what they're doing using it 😉

jvican · 2018-08-27T08:41:01Z

@muuki88 Thanks for the review, I answered most of your comments. Regarding the additions of more docs (for example in the profiling schema), I'm afraid I won't be able to do so until the beginning of October. My schedule for the next 4 weeks looks pretty busy.

muuki88 · 2018-08-27T08:48:43Z

@jvican thanks for the quick feedback 🤗

I'm afraid I won't be able to do so until the beginning of October. My schedule for the next 4 weeks looks pretty busy.

no worries 😄

I realized how stupid it was the idea of keeping two string tables intead of one, since it's unlikely we'll ever have more than 2147483647 unique string instances and the JVM doesn't allow to instantiate such a big array in normal heap sizes. So we remove this part of the design that we inherited from pprof.

jvican · 2018-08-27T09:17:41Z

@muuki88 OK, motivated by your comments I quickly went into the GitHub interface and submitted two changes: more docs and an improvement to the internals of the profiler. Feel free to have a look at them and tell me if they are useful.

eed3si9n · 2018-08-27T14:12:47Z

Aside from that, it introduces a protobuf data to track all the invalidation logic.

Given that Google has not move protobuf towards JDK 9/10/11 yet, I am concerned about bringing Protobuf in as a critical piece. As far as I know, the current usage is somewhat limited to Analysis serialization only right? What do you think about using Contraband instead here?

eed3si9n · 2018-08-27T14:20:01Z

internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCommon.scala

-      all ++ invalidated // need the union because all doesn't contain removed sources
-    } else invalidated
-  }
+  def recompileClasses(


Documentation, reorganization, and some changes are intermixed in this commit, so it's difficult for me to preview, but have you made any intentional change in the observable behavior, either in the difference in output or in performance characteristics? For example, I noticed that the method named recompileClasses are no longer recompiling classes, but it's recompiling sources.

There’s no change in semantics, I double checked this several times manually and the tests confirm it.

Wrt the change you mention @eed3si9n, the reason why you see that we recompile sources instead of classes in a clearer way is because recompileClasses was doing the mapping between invalidated classes and sources to recompile, and I pulled that logic out of it to keep things clear.

jvican · 2018-08-27T15:46:58Z

Given that Google has not move protobuf towards JDK 9/10/11 yet, I am concerned about bringing Protobuf in as a critical piece. As far as I know, the current usage is somewhat limited to Analysis serialization only right? What do you think about using Contraband instead here?

It can be ported in the future if need so, but the elephant in the the room is the analysis file implementation. Conversely, this use of protobuf can be easily disabled without affecting the correctness of the language (and in fact by default it won’t even initialize the protobuf classes — the warning won’t happen), so I wouldn’t worry about it. The protobuf issue will be fixed down the road (I have high confidence in that), and if it doesn’t we can find solutions by then.

muuki88 · 2018-08-27T17:48:05Z

Awesome @jvican ! That clarifies a lot of things 😍

eed3si9n · 2018-08-29T05:26:00Z

It can be ported in the future if need so, but the elephant in the the room is the analysis file implementation.

Analysis file implementation can be switched back to using text file like sbt 0.13.

Conversely, this use of protobuf can be easily disabled without affecting the correctness of the language

If it's easy to not use protobuf, let's switch to using Contraband like the rest of the datatypes.

jvican · 2018-08-29T07:32:51Z

If it's easy to not use protobuf, let's switch to using Contraband like the rest of the datatypes.

I think you misread me 😛 I didn't say it's easy to switch to Contraband (it would actually take me quite a lot of time that I cannot afford now), I said it's easy not to use the protobuf-based profiler in the future. By default, the profiler is empty and in the worst-case scenario if protobuf doesn't remove the warning in JDK 11 you can (1) no use any profiler at all or (2) implement your own (and that is if you use it in the first place).

jvican added 2 commits August 26, 2018 14:22

Remove stale ClassToSourceMapper functions

a813eb9

And inline the actual used function in the companion of `IncrementalCommon`, with a better and simpler implementation.

jvican added enhancement Scala Center performance infrastructure labels Aug 26, 2018

jvican force-pushed the topic/wip-incremental-common branch from 2952644 to 2e63f5d Compare August 26, 2018 12:38

jvican requested a review from eed3si9n August 26, 2018 12:38

jvican added 2 commits August 26, 2018 20:04

Exclude binary compatibility errors in zincCore

13c7d4d

These methods are not exposed to the public API of Zinc and can therefore be changed with certainty.

jvican force-pushed the topic/wip-incremental-common branch from 2e63f5d to 13c7d4d Compare August 26, 2018 18:06

muuki88 reviewed Aug 27, 2018

View reviewed changes

jvican added 2 commits August 27, 2018 11:03

Add some docs to clarify design

90b56cd

eed3si9n reviewed Aug 27, 2018

View reviewed changes

eed3si9n changed the base branch from 1.x to develop August 30, 2018 15:31

eed3si9n approved these changes Aug 31, 2018

View reviewed changes

eed3si9n merged commit b125d82 into sbt:develop Aug 31, 2018

jvican mentioned this pull request Sep 18, 2018

Should we re-consider zinc's incremental compilation? bazelbuild/rules_scala#328

Closed

eed3si9n added this to the 1.3.0 milestone Apr 28, 2019


		def profileRun: RunProfiler = new ZincProfilerImplementation

		private final var runs: List[zprof.ZincRun] = Nil

Document internals of invalidation and hook profiler #585

Document internals of invalidation and hook profiler #585

Conversation

jvican commented Aug 26, 2018

jvican commented Aug 26, 2018

jvican commented Aug 26, 2018

muuki88 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvican commented Aug 27, 2018

muuki88 commented Aug 27, 2018

jvican commented Aug 27, 2018

eed3si9n commented Aug 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvican commented Aug 27, 2018

muuki88 commented Aug 27, 2018

eed3si9n commented Aug 29, 2018

jvican commented Aug 29, 2018