Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share ChildBinding objects between siblings #4238

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

jackkoenig
Copy link
Contributor

It's hard to measure the benefit from this, but I used the following little snippet of Chisel to make a bundle of Bundles:

import chisel3._
// _root_ disambiguates from package chisel3.util.circt if user imports chisel3.util._
import _root_.circt.stage.ChiselStage

class MyBundle extends Bundle {
  val a, b, c, d, e, f, g = UInt(8.W)
}

class Foo(n: Int) extends Module {
  val in = IO(Input(Vec(n, new MyBundle)))
  val out = IO(Output(Vec(n, new MyBundle)))

  out :#= in
}

object Main extends App {
  val n = args(0).toInt
  ChiselStage
    .emitCHIRRTLFile(
      gen = new Foo(n)
    )
    .toSeq
}

Then I built a fat assembly jar with Mill, and checked the heap size with find_heap_bound:

./firrtl/benchmark/scripts/find_heap_bound.py -vvv --start-size 4G --min-step 100M --context 2 -- -cp assembly.jar Main 200000

With this, I measured a ~2.5% memory reduction for this very Bundle-heavy code.

I had thought about a more aggressive change, where ChildBinding becomes a case object and we instead get the parent from ref (which does duplicate this information). However, using the above benchmark, I measured the exact same memory improvement. Since I cannot measure any benefit from the much more aggressive and hacky alternative approach, I think we should do the simpler thing which gets more-or-less the same benefit.

Contributor Checklist

  • Did you add Scaladoc to every public function/method?
  • Did you add at least one test demonstrating the PR?
  • Did you delete any extraneous printlns/debugging code?
  • Did you specify the type of improvement?
  • Did you add appropriate documentation in docs/src?
  • Did you request a desired merge strategy?
  • Did you add text to be included in the Release Notes for this change?

Type of Improvement

  • Performance improvement

Desired Merge Strategy

  • Squash

Release Notes

This reduces memory use by n - 1 times 16-bytes for an Aggregate with n elements.

Reviewer Checklist (only modified by reviewer)

  • Did you add the appropriate labels? (Select the most appropriate one based on the "Type of Improvement")
  • Did you mark the proper milestone (Bug fix: 3.6.x, 5.x, or 6.x depending on impact, API modification or big change: 7.0)?
  • Did you review?
  • Did you check whether all relevant Contributor checkboxes have been checked?
  • Did you do one of the following when ready to merge:
    • Squash: You/ the contributor Enable auto-merge (squash), clean up the commit message, and label with Please Merge.
    • Merge: Ensure that contributor has cleaned up their commit history, then merge with Create a merge commit.

@jackkoenig jackkoenig added the Performance Improves performance, will be included in release notes label Jul 2, 2024
@jackkoenig jackkoenig added this to the 6.x milestone Jul 2, 2024
@jackkoenig
Copy link
Contributor Author

I validated that the ChildBinding objects are not currently shared and that with this change they are now shared with the Eclipse MemoryAnalyzer (https://eclipse.dev/mat/). This changed the measured memory use per UInt child of the Bundles in my example above from 72 bytes shallow, 144 retained to 72 bytes shallow, 128 retained, which measures my expectation. I'm also impressed that MAT is able to figure out that the individual elements should not be dinged for retaining the object that they all share.

@jackkoenig jackkoenig merged commit d499a88 into main Jul 2, 2024
18 checks passed
@jackkoenig jackkoenig deleted the jackkoenig/share-childbinding branch July 2, 2024 17:06
@mergify mergify bot added the Backported This PR has been backported label Jul 2, 2024
mergify bot pushed a commit that referenced this pull request Jul 2, 2024
chiselbot pushed a commit that referenced this pull request Jul 2, 2024
(cherry picked from commit d499a88)

Co-authored-by: Jack Koenig <[email protected]>
@jackkoenig
Copy link
Contributor Author

jackkoenig commented Jul 2, 2024

I figured out how to benchmark a little bit better by avoiding serialization

import chisel3._
// _root_ disambiguates from package chisel3.util.circt if user imports chisel3.util._
import _root_.circt.stage.ChiselStage

class MyBundle extends Bundle {
  val a, b, c, d, e, f, g = UInt(8.W)
}

class Foo(n: Int) extends Module {
  val in = IO(Input(Vec(n, new MyBundle)))
  val out = IO(Output(Vec(n, new MyBundle)))

  out :#= in
}

object Main extends App {
  val n = args(0).toInt
  val phase = new chisel3.stage.phases.Elaborate
  val annos = Seq(
    chisel3.stage.ChiselGeneratorAnnotation(() => new Foo(n))
  )
  println(phase.transform(annos).size)
}

Using the same basic approach but smaller min-step and more context:

./firrtl/benchmark/scripts/find_heap_bound.py -vvv --start-size 4G --min-step 10M --context 5 -- -cp assembly.jar Main 200000

Without this change:

Xmx Max RSS (MiB) Wall Clock (s) User Time (s)
1G 1344 7.09 56.45
920M 1269 6.92 88.26
910M 1220 7.61 90.95
900M 1198 9.98 116.99
890M - - -
870M - - -

With this change:

Xmx Max RSS (MiB) Wall Clock (s) User Time (s)
1G 1324 6.75 74.59
880M 1202 6.86 58.98
870M 1221 7.1 82.23
860M 1164 9.19 98.01
850M - - -
840M - - -

Using the best Max RSS for each shows a memory reduction of 2.8%. Small, but measurable!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backported This PR has been backported Performance Improves performance, will be included in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants