Static Object Optimizations #197

ahirreddy · 2024-01-04T01:46:42Z

This PR uses the Parser & StaticOptimizer thread local string interner for keys in static objects
Similarly, we deduplicate the String -> Boolean map used to determine if a field is static.
For static objects we also use immutable.VectorMap (with a JavaWrapper) for the field set.
Lastly for the value cache, we size it according to the number of keys in the object this reduces unnecessary up-sizing for large objects, and more importantly removes the large number of sparse maps we previously had for small objects (the default was 16 elements)

Before: 855MB for the parsed file

After: 425MB

szeiger

What's the performance impact of these changes? Are we trading memory for speed?

szeiger · 2024-01-08T12:28:37Z

bench/src/main/scala/sjsonnet/MainBenchmark.scala

+// This is a dummy benchmark to see how much memory is used by the interpreter.
+// You're meant to execute it, and once it prints "sleeping" you can attach yourkit and take a heap
+// dump. Because we store the cache, the parsed objects will have strong references - and thus will
+// be in the heap dump.


Could we turn this into a simple command line program that does a gc before and after parsing and then prints the memory usage diff to the console so you can easily retest this for future changes without having to attach a profiler?

Done and updated readme.

szeiger · 2024-01-08T12:30:38Z

sjsonnet/src/sjsonnet/StaticOptimizer.scala

+  // HashMap to deduplicate strings.
+  private[this] val strings = new mutable.HashMap[String, String]
+
+  private[this] val fieldSet = new mutable.HashMap[Val.StaticObjectFieldSet, java.util.LinkedHashMap[String, java.lang.Boolean]]
+


Why is this cache separate from the Parser's? Should it be a single cache at the Interpreter level?

Changed, there's now a single cache at the interpreter level.

szeiger · 2024-01-08T12:42:40Z

sjsonnet/src/sjsonnet/Val.scala

@@ -297,15 +298,45 @@ object Val{
    }
  }

-  def staticObject(pos: Position, fields: Array[Expr.Member.Field]): Obj = {
+  final case class StaticObjectFieldSet(keys: Array[String]) {


This shouldn't be a case class. Array has no useful equality or toString and you're overriding equals and hashCode.

…hashmaps2

ahirreddy · 2024-01-09T01:32:34Z

Sorry I missed your most important question. The performance impact here was undetectable in the benchmark. I think the fact that the object interning is thread-local and unsynchronized makes it pretty fast.

szeiger · 2024-01-09T12:30:50Z

bench/src/main/scala/sjsonnet/OptimizerBenchmark.scala

@@ -3,6 +3,8 @@ package sjsonnet
 import java.io.StringWriter
 import java.util.concurrent.TimeUnit

+import scala.collection.mutable.HashMap


I'd prefer to keep mutable types qualified (i.e. only import scala.collection.mutable) everywhere for consistency.

…e object creation in common cases (#258) This PR bundles together several small optimizations, most aimed at reducing garbage object creation. Collectively, these changes result in a large performance improvement for some of our largest jsonnet inputs. ## Optimizations These are somewhat coherently split into multiple smaller intermediate commits, which can help for navigating this change. I'll describe each optimization in more detail below. ### Optimizing `Obj` key lookup methods for objects without `super` The `hasKeys`, `containsKey`, `containsVisibleKey`, `allKeyNames`, and `visibleKeyNames` methods can be optimized in the common case of objects that don't have `super` objects. We already perform an optimization for `static` objects, pre-populating `allKeys`, but for non-static objects we had to run `gatherAllKeys` to populate a `LinkedHashMap` of keys and a boolean indicating visibility. If a non-static object has no `super` then we can just use `value0` to compute the keys: this avoids an additional `LinkedHashMap` allocation and also lets us pre-size the resulting string arrays, avoiding wasteful array copies from resizes or trims. In `visibleKeyNames`, I chose to pre-size the output builder based on the _total_ key count: this is based a common-case assumption that most objects don't have hidden keys. This optimization makes a huge difference in `std.foldl(std.megePatch, listOfPatches, {})`, where the intermediate merge targets' visible are repeatedly recomputed. In these cases, the intermediate objects contain _only_ visible keys, allowing this optimization to maximally avoid unnecessary array allocations. ### Pre-size various hash maps This builds on an idea from #197 : there are multiple places where we construct hashmaps that are either over- or under-sized: an over-sized map wastes space (I saw >90% of backing array slots wasted in some heap dumps) and an under-sized map wastes time and space in re-sizing upwards during construction. Here, I've generalized that PR's pre-sizing to apply in more contexts. One notable special case is the `valueCache`: if an object inherits fields then it's not free to determine the map size. As a result, I've only special-sized this for `super`-free objects. This map is a little bit different from `value0` or `allFields` because its final size is a function of whether or not object field values are actually computed. Given this, I've chosen to start pretty conservatively by avoiding changing the size in cases where it's not an obvious win; I may revisit this further in a future followup. ### Change `valueCache` from a Scala map to a Java map This was originally necessary because the Scala 2.12 version of `mutable.HashMap` does not support capacity / load factor configuration, which got in the way with the pre-sizing work described above. But a nice secondary benefit is that Java maps let me avoid closure / anonfun allocation in `map.getOrElse(k, default)` calls: even if we don't invoke `default`, we still end up doing some allocations for the lambda / closure / thunk. I had noticed this overhead previously in `Obj.value` and this optimization should fix it. ### Remove most Scala sugar in `std.mergePatch`, plus other optimizations The `recMerge` and `recSingle` methods used by `std.mergePatch` contained big Scala `for` comprehensions and used `Option` for handling nulls. This improves readability but comes at a surprising performance cost. I would have naively assumed that most of those overheads would have been optimized out but empirically this was not the case in my benchmarks. Here, I've rewritten this with Java-style imperative `while` loops and explicit null checks. ### Optimize `std.mergePatch`'s distinct key computation After fixing other bottlenecks, I noticed that the ```scala val keys: Array[String] = (l.visibleKeyNames ++ r.visibleKeyNames).distinct ``` step in `std.mergePatch` was very expensive. Under the hood, this constructs a combined array, allocates an ArrayBuilder, and uses an intermediate HashSet for detecting already-seen keys. Here, I've added an optimized fast implementation for the cases where `r.visibleKeyNames.length < 8`. I think it's much more common for the LHS of a merge to be large and the RHS to be small, in which case we're conceptually better off by building a hash set on the RHS and removing RHS elements as they're seen during the LHS traversal. But if the RHS is small enough then the cost of hashing and probing will be higher than a simple linear scan of a small RHS array. Here, `8` is a somewhat arbitrarily chosen threshold based on some local microbenchmarking. ### Special overload of `Val.Obj.mk` to skip an array copy Pretty self-explanatory: we often have an `Array[(String, Val.Member)]` and we can avoid a copy by defining a `Val.Obj.mk` overload which accepts the array directly. ### Make `PrettyNamed` implicits into constants This is pretty straightforward, just changing a `def` to a `val`, but it makes a huge different in reducing ephemeral garabge in some parts of the evaluator. ## Other changes I also added `Error.fail` calls in a couple of `case _ =>` matches which should never be hit. We weren't actually hitting these, but it felt like a potentially dangerous pitfall to silently ignore those cases.

ahirreddy added 6 commits January 3, 2024 16:16

update

dfca0a5

local string intern

e579729

field set builder

cc70b77

fix idx

4c7cc17

reduce further

8ea91ea

revert

7119af0

ahirreddy requested review from lihaoyi, lihaoyi-databricks and szeiger January 4, 2024 02:08

ahirreddy mentioned this pull request Jan 4, 2024

Compact HashMaps #196

Closed

ahirreddy added 4 commits January 3, 2024 18:42

java hashmap

003877e

remove unused bitset

256ac70

remove unused bitset

76b6448

fix merge conflicts

f76e704

lihaoyi removed their request for review January 5, 2024 07:37

szeiger reviewed Jan 8, 2024

View reviewed changes

ahirreddy added 6 commits January 8, 2024 10:27

share

7fd6802

remove

55b51ec

Merge remote-tracking branch 'origin/compact-hashmaps2' into compact-…

d57fe8b

…hashmaps2

fix compile

f66284b

memory benchmark

475fc90

update readme

3a16d4e

szeiger approved these changes Jan 9, 2024

View reviewed changes

lihaoyi-databricks added 2 commits January 9, 2024 05:34

Update ParserTests.scala

b96a4f6

Update OptimizerBenchmark.scala

28b1cd3

lihaoyi-databricks merged commit c462833 into master Jan 9, 2024
1 check passed

JoshRosen mentioned this pull request Jan 4, 2025

Several performance optimizations, primarily aimed at reducing garbage object creation in common cases #258

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static Object Optimizations #197

Static Object Optimizations #197

ahirreddy commented Jan 4, 2024 •

edited

Loading

szeiger left a comment

szeiger Jan 8, 2024

ahirreddy Jan 8, 2024

szeiger Jan 8, 2024

ahirreddy Jan 8, 2024

szeiger Jan 8, 2024

ahirreddy Jan 8, 2024

ahirreddy commented Jan 9, 2024

szeiger Jan 9, 2024

Static Object Optimizations #197

Static Object Optimizations #197

Conversation

ahirreddy commented Jan 4, 2024 • edited Loading

szeiger left a comment

Choose a reason for hiding this comment

szeiger Jan 8, 2024

Choose a reason for hiding this comment

ahirreddy Jan 8, 2024

Choose a reason for hiding this comment

szeiger Jan 8, 2024

Choose a reason for hiding this comment

ahirreddy Jan 8, 2024

Choose a reason for hiding this comment

szeiger Jan 8, 2024

Choose a reason for hiding this comment

ahirreddy Jan 8, 2024

Choose a reason for hiding this comment

ahirreddy commented Jan 9, 2024

szeiger Jan 9, 2024

Choose a reason for hiding this comment

ahirreddy commented Jan 4, 2024 •

edited

Loading