-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate datatypes with sat-datatype, fix dependencies between projects #1
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@@ -0,0 +1 @@ | |||
target/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the global gitignore.
eed3si9n
added a commit
that referenced
this pull request
Aug 5, 2015
Generate datatypes with sat-datatype, fix dependencies between projects
romanowski
referenced
this pull request
in romanowski/zinc
Jul 28, 2016
…zinc-fixes to ms-release * commit 'ee48218ae5ba6e8fe25aff7ee07345fc2776a00c': Changes requried to compile codetree using zinc.
jvican
referenced
this pull request
in scalacenter/zinc
Feb 25, 2017
This commit takes care of speeding up analysis of type dependencies as much as possible. In both `ExtractUsedNames` and `Dependency`, we have a cache function associated with a source symbol. This source symbol is the "key" of the cache in the sense that from it we detect how a dependency should be tracked. `Dependency`, for instance, adds a dependency from `X` to `Y`, where X is the origin symbol and `Y` is the destination symbol. However, only `X` determines how to a dependency should be added (and on which data structure). The same happens for `ExtractAPI`, but whose case is simpler because there is no destination symbol: only the origin symbol is the necessary to cache -- we have a set of names for a given symbol. Our previous type analysis had a type cache, but this type cache only lasted one type traversal. The algorihtm was very pessimistic -- we cleared the `visited` cache with `reinitializeVisited` after every traversal so that members would be correctly recognized if the origin symbol changed. However, the origin symbol usually stays the same, especially when traversing bodies of methods and variables, which contain a high proportion of types. Taking this into account, we arrive to the conclusion that we can keep type caches around as long as the `currentOwner` doesn't change, because dependencies are only registered for top-level classes in both cases (`ExtractAPI` and `Dependency`). The introduced solution allows every phase to implement their own `TypeTraverser` and override the function that takes care of adding a dependency. This is necessary because the functions to add dependencies depend on the context (origin symbols and more stuff), which ultimately varies in `ExtractAPI` and `Dependency`. This new approach not only reduces the footprint of the current type traverser which visits types and type symbols. It shows how having an efficient algorithm in these parts of the phases make sense, since the amount of times this API is called is **huge**. In comparison with df30872, the new changes improve the running time of Zinc by ~3 seconds in both hot and cold benchmarks. This adds up to the performance improvements added in df30872, which add up to a decrease of 6 seconds in running time both in hot and cold compilers. This represents an improvement of 15%. The following benchmark has been obtained by the same formula as the commit mentioned before, and benchmarks the compilation of the Scala standard library. ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_8d7158c4 sample 18 21042.357 ± 976.268 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_8d7158c4 sample 19293.798 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_8d7158c4 sample 21156.069 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_8d7158c4 sample 22364.029 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_8d7158c4 sample 18 285.162 ± 12.164 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_8d7158c4 sample 18 6433635588.000 ± 27789448.650 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_8d7158c4 sample 18 285.256 ± 28.447 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_8d7158c4 sample 18 6435547818.667 ± 569086120.516 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_8d7158c4 sample 18 13.477 ± 12.363 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_8d7158c4 sample 18 311931407.111 ± 284928263.384 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_8d7158c4 sample 18 6.558 ± 3.504 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_8d7158c4 sample 18 150100543.111 ± 83475557.157 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_8d7158c4 sample 18 103.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_8d7158c4 sample 18 22284.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_8d7158c4 sample 3 55208.225 ± 19525.177 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_8d7158c4 sample 54492.398 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_8d7158c4 sample 54693.724 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_8d7158c4 sample 3 118.129 ± 39.875 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_8d7158c4 sample 3 6952140256.000 ± 929140223.772 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_8d7158c4 sample 3 112.542 ± 57.930 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_8d7158c4 sample 3 6623418026.667 ± 2792358205.420 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_8d7158c4 sample 3 2.545 ± 2.670 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_8d7158c4 sample 3 149821293.333 ± 169342378.518 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_8d7158c4 sample 3 75.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_8d7158c4 sample 3 9372.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_8d7158c4 ss 10 45882.107 ± 1434.664 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_8d7158c4 ss 10 145.998 ± 4.339 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_8d7158c4 ss 10 7155800608.000 ± 76895180.455 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_8d7158c4 ss 10 137.830 ± 7.096 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_8d7158c4 ss 10 6757801688.800 ± 407535345.265 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_8d7158c4 ss 10 2.822 ± 0.677 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_8d7158c4 ss 10 138273318.400 ± 32492030.573 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_8d7158c4 ss 10 244.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_8d7158c4 ss 10 30907.000 ms [success] Total time: 1618 s, completed Feb 25, 2017 10:49:38 PM [success] Total time: 0 s, completed Feb 25, 2017 10:49:38 PM ``` Machine info: ``` jvican in /data/rw/code/scala/zinc [22:24:47] > $ uname -a [±as-seen-from ●▴▾] Linux tribox 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux jvican in /data/rw/code/scala/zinc [23:15:57] > $ cpupower frequency-info [±as-seen-from ●▴▾] analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 3.40 GHz available cpufreq governors: performance powersave current policy: frequency should be within 3.20 GHz and 3.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.32 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes jvican in /data/rw/code/scala/zinc [23:16:14] > $ cat /proc/meminfo [±as-seen-from ●▴▾] MemTotal: 20430508 kB MemFree: 9890712 kB MemAvailable: 13490908 kB Buffers: 3684 kB Cached: 4052520 kB SwapCached: 0 kB Active: 7831612 kB Inactive: 2337220 kB Active(anon): 6214680 kB Inactive(anon): 151436 kB Active(file): 1616932 kB Inactive(file): 2185784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 12582908 kB SwapFree: 12582908 kB Dirty: 124 kB Writeback: 0 kB AnonPages: 6099876 kB Mapped: 183096 kB Shmem: 253488 kB Slab: 227436 kB SReclaimable: 152144 kB SUnreclaim: 75292 kB KernelStack: 5152 kB PageTables: 19636 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22798160 kB Committed_AS: 7685996 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 5511168 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 136620 kB DirectMap2M: 4970496 kB DirectMap1G: 15728640 kB jvican in /data/rw/code/scala/zinc [23:16:41] > $ cat /proc/cpuinfo [±as-seen-from ●▴▾] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3297.827 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3296.459 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.22 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3399.853 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5621.16 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3210.327 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.33 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ```
jvican
referenced
this pull request
in scalacenter/zinc
Feb 26, 2017
This commit takes care of speeding up analysis of type dependencies as much as possible. In both `ExtractUsedNames` and `Dependency`, we have a cache function associated with a source symbol. This source symbol is the "key" of the cache in the sense that from it we detect how a dependency should be tracked. `Dependency`, for instance, adds a dependency from `X` to `Y`, where X is the origin symbol and `Y` is the destination symbol. However, only `X` determines how to a dependency should be added (and on which data structure). The same happens for `ExtractAPI`, but whose case is simpler because there is no destination symbol: only the origin symbol is the necessary to cache -- we have a set of names for a given symbol. Our previous type analysis had a type cache, but this type cache only lasted one type traversal. The algorihtm was very pessimistic -- we cleared the `visited` cache with `reinitializeVisited` after every traversal so that members would be correctly recognized if the origin symbol changed. However, the origin symbol usually stays the same, especially when traversing bodies of methods and variables, which contain a high proportion of types. Taking this into account, we arrive to the conclusion that we can keep type caches around as long as the `currentOwner` doesn't change, because dependencies are only registered for top-level classes in both cases (`ExtractAPI` and `Dependency`). The introduced solution allows every phase to implement their own `TypeTraverser` and override the function that takes care of adding a dependency. This is necessary because the functions to add dependencies depend on the context (origin symbols and more stuff), which ultimately varies in `ExtractAPI` and `Dependency`. This new approach not only reduces the footprint of the current type traverser which visits types and type symbols. It shows how having an efficient algorithm in these parts of the phases make sense, since the amount of times this API is called is **huge**. In comparison with df30872, the new changes improve the running time of Zinc by ~3 seconds in both hot and cold benchmarks. This adds up to the performance improvements added in df30872, which add up to a decrease of 6 seconds in running time both in hot and cold compilers. This represents an improvement of 15%. The following benchmark has been obtained by the same formula as the commit mentioned before, and benchmarks the compilation of the Scala standard library. ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_8d7158c4 sample 18 21042.357 ± 976.268 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_8d7158c4 sample 19293.798 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_8d7158c4 sample 21156.069 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_8d7158c4 sample 22364.029 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_8d7158c4 sample 23119.004 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_8d7158c4 sample 18 285.162 ± 12.164 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_8d7158c4 sample 18 6433635588.000 ± 27789448.650 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_8d7158c4 sample 18 285.256 ± 28.447 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_8d7158c4 sample 18 6435547818.667 ± 569086120.516 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_8d7158c4 sample 18 13.477 ± 12.363 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_8d7158c4 sample 18 311931407.111 ± 284928263.384 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_8d7158c4 sample 18 6.558 ± 3.504 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_8d7158c4 sample 18 150100543.111 ± 83475557.157 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_8d7158c4 sample 18 103.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_8d7158c4 sample 18 22284.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_8d7158c4 sample 3 55208.225 ± 19525.177 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_8d7158c4 sample 54492.398 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_8d7158c4 sample 54693.724 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_8d7158c4 sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_8d7158c4 sample 3 118.129 ± 39.875 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_8d7158c4 sample 3 6952140256.000 ± 929140223.772 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_8d7158c4 sample 3 112.542 ± 57.930 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_8d7158c4 sample 3 6623418026.667 ± 2792358205.420 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_8d7158c4 sample 3 2.545 ± 2.670 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_8d7158c4 sample 3 149821293.333 ± 169342378.518 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_8d7158c4 sample 3 75.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_8d7158c4 sample 3 9372.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_8d7158c4 ss 10 45882.107 ± 1434.664 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_8d7158c4 ss 10 145.998 ± 4.339 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_8d7158c4 ss 10 7155800608.000 ± 76895180.455 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_8d7158c4 ss 10 137.830 ± 7.096 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_8d7158c4 ss 10 6757801688.800 ± 407535345.265 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_8d7158c4 ss 10 2.822 ± 0.677 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_8d7158c4 ss 10 138273318.400 ± 32492030.573 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_8d7158c4 ss 10 244.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_8d7158c4 ss 10 30907.000 ms [success] Total time: 1618 s, completed Feb 25, 2017 10:49:38 PM [success] Total time: 0 s, completed Feb 25, 2017 10:49:38 PM ``` Machine info: ``` jvican in /data/rw/code/scala/zinc [22:24:47] > $ uname -a [±as-seen-from ●▴▾] Linux tribox 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux jvican in /data/rw/code/scala/zinc [23:15:57] > $ cpupower frequency-info [±as-seen-from ●▴▾] analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 3.40 GHz available cpufreq governors: performance powersave current policy: frequency should be within 3.20 GHz and 3.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.32 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes jvican in /data/rw/code/scala/zinc [23:16:14] > $ cat /proc/meminfo [±as-seen-from ●▴▾] MemTotal: 20430508 kB MemFree: 9890712 kB MemAvailable: 13490908 kB Buffers: 3684 kB Cached: 4052520 kB SwapCached: 0 kB Active: 7831612 kB Inactive: 2337220 kB Active(anon): 6214680 kB Inactive(anon): 151436 kB Active(file): 1616932 kB Inactive(file): 2185784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 12582908 kB SwapFree: 12582908 kB Dirty: 124 kB Writeback: 0 kB AnonPages: 6099876 kB Mapped: 183096 kB Shmem: 253488 kB Slab: 227436 kB SReclaimable: 152144 kB SUnreclaim: 75292 kB KernelStack: 5152 kB PageTables: 19636 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22798160 kB Committed_AS: 7685996 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 5511168 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 136620 kB DirectMap2M: 4970496 kB DirectMap1G: 15728640 kB jvican in /data/rw/code/scala/zinc [23:16:41] > $ cat /proc/cpuinfo [±as-seen-from ●▴▾] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3297.827 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3296.459 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.22 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3399.853 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5621.16 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3210.327 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.33 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ```
jvican
referenced
this pull request
in scalacenter/zinc
Feb 26, 2017
This commit takes care of speeding up analysis of type dependencies as much as possible. In both `ExtractUsedNames` and `Dependency`, we have a cache function associated with a source symbol. This source symbol is the "key" of the cache in the sense that from it we detect how a dependency should be tracked. `Dependency`, for instance, adds a dependency from `X` to `Y`, where X is the origin symbol and `Y` is the destination symbol. However, only `X` determines how to a dependency should be added (and on which data structure). The same happens for `ExtractAPI`, but whose case is simpler because there is no destination symbol: only the origin symbol is the necessary to cache -- we have a set of names for a given symbol. Our previous type analysis had a type cache, but this type cache only lasted one type traversal. The algorihtm was very pessimistic -- we cleared the `visited` cache with `reinitializeVisited` after every traversal so that members would be correctly recognized if the origin symbol changed. However, the origin symbol usually stays the same, especially when traversing bodies of methods and variables, which contain a high proportion of types. Taking this into account, we arrive to the conclusion that we can keep type caches around as long as the `currentOwner` doesn't change, because dependencies are only registered for top-level classes in both cases (`ExtractAPI` and `Dependency`). The introduced solution allows every phase to implement their own `TypeTraverser` and override the function that takes care of adding a dependency. This is necessary because the functions to add dependencies depend on the context (origin symbols and more stuff), which ultimately varies in `ExtractAPI` and `Dependency`. The following benchmark has been obtained by the same formula as the commit mentioned before, and benchmarks the compilation of the Scala standard library. BEFORE ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_b9131bfb sample 18 21228.771 ± 521.207 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 20199.768 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 21256.733 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 21931.177 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 18 284.115 ± 6.036 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 18 6474818679.556 ± 42551265.360 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 18 283.385 ± 23.147 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 18 6455703779.556 ± 483463770.519 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 18 12.857 ± 12.406 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 18 297978002.222 ± 287556197.389 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 18 6.901 ± 2.092 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 18 158212212.444 ± 50375116.805 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 18 105.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 18 21814.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_b9131bfb sample 3 55924.053 ± 16257.754 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 54895.051 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 3 117.417 ± 27.439 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 3 6999695530.667 ± 608845574.720 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 3 111.263 ± 90.263 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 3 6633605792.000 ± 5698534573.516 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 3 0.001 ± 0.040 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 3 74741.333 ± 2361755.471 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 3 2.478 ± 7.592 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 3 147881869.333 ± 475964254.946 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 3 9581.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_b9131bfb ss 10 45562.453 ± 836.977 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb ss 10 147.126 ± 2.229 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb ss 10 7163351651.200 ± 57993163.779 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb ss 10 137.407 ± 6.810 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb ss 10 6692512710.400 ± 429243418.572 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb ss 10 2.647 ± 0.168 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb ss 10 128840603.200 ± 7324571.862 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb ss 10 245.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb ss 10 29462.000 ms [success] Total time: 1595 s, completed Feb 26, 2017 1:42:55 AM [success] Total time: 0 s, completed Feb 26, 2017 1:42:55 AM ``` AFTER ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_c8a4806b sample 18 20757.144 ± 519.221 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 19931.333 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 20786.971 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 21615.765 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 18 290.476 ± 7.069 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 18 6476081869.778 ± 18700713.424 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 18 290.409 ± 20.336 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 18 6478102528.000 ± 468310673.653 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 18 13.261 ± 12.790 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 18 301324965.333 ± 290518111.715 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 18 6.735 ± 2.338 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 18 150953349.778 ± 54074639.209 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 18 101.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 18 21267.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_c8a4806b sample 3 54380.549 ± 24064.367 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 53552.873 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 53687.091 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 3 120.159 ± 52.914 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 3 6963979373.333 ± 137408036.138 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 3 113.755 ± 135.915 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 3 6588595392.000 ± 5170161565.753 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 3 0.002 ± 0.048 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 3 90400.000 ± 2856554.534 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 3 2.623 ± 7.378 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 3 151896768.000 ± 399915676.894 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 3 10070.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_c8a4806b ss 10 45613.670 ± 1724.291 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b ss 10 147.106 ± 4.973 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b ss 10 7165665000.000 ± 68500786.134 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b ss 10 138.633 ± 12.612 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b ss 10 6749057403.200 ± 438983252.418 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b ss 10 2.716 ± 0.298 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b ss 10 132216236.800 ± 11751803.094 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b ss 10 247.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b ss 10 29965.000 ms [success] Total time: 1593 s, completed Feb 26, 2017 11:54:01 AM [success] Total time: 0 s, completed Feb 26, 2017 11:54:01 AM ``` Machine info: ``` jvican in /data/rw/code/scala/zinc [22:24:47] > $ uname -a [±as-seen-from ●▴▾] Linux tribox 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux jvican in /data/rw/code/scala/zinc [23:15:57] > $ cpupower frequency-info [±as-seen-from ●▴▾] analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 3.40 GHz available cpufreq governors: performance powersave current policy: frequency should be within 3.20 GHz and 3.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.32 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes jvican in /data/rw/code/scala/zinc [23:16:14] > $ cat /proc/meminfo [±as-seen-from ●▴▾] MemTotal: 20430508 kB MemFree: 9890712 kB MemAvailable: 13490908 kB Buffers: 3684 kB Cached: 4052520 kB SwapCached: 0 kB Active: 7831612 kB Inactive: 2337220 kB Active(anon): 6214680 kB Inactive(anon): 151436 kB Active(file): 1616932 kB Inactive(file): 2185784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 12582908 kB SwapFree: 12582908 kB Dirty: 124 kB Writeback: 0 kB AnonPages: 6099876 kB Mapped: 183096 kB Shmem: 253488 kB Slab: 227436 kB SReclaimable: 152144 kB SUnreclaim: 75292 kB KernelStack: 5152 kB PageTables: 19636 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22798160 kB Committed_AS: 7685996 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 5511168 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 136620 kB DirectMap2M: 4970496 kB DirectMap1G: 15728640 kB jvican in /data/rw/code/scala/zinc [23:16:41] > $ cat /proc/cpuinfo [±as-seen-from ●▴▾] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3297.827 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3296.459 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.22 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3399.853 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5621.16 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3210.327 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.33 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ``` In comparison with df30872, the new changes improve the running time of Zinc by half a second in hot and warm benchmarks, and a decrease of 100ms for cold benchmarks, which seems to be product of the variation given the number of ms/op. It is a success taking into account that now we're traversing more types and symbols than before, so these changes allow us to do more work and still decrease the running time of Zinc. These changes are likely to have a bigger effect on huge industrial codebases in which the ratio of types is very high, and with a lot of rich types like poly types, method types, refinements and existential types that have lots of constraints.
jvican
referenced
this pull request
in scalacenter/zinc
Feb 26, 2017
This commit takes care of speeding up analysis of type dependencies as much as possible. In both `ExtractUsedNames` and `Dependency`, we have a cache function associated with a source symbol. This source symbol is the "key" of the cache in the sense that from it we detect how a dependency should be tracked. `Dependency`, for instance, adds a dependency from `X` to `Y`, where X is the origin symbol and `Y` is the destination symbol. However, only `X` determines how to a dependency should be added (and on which data structure). The same happens for `ExtractAPI`, but whose case is simpler because there is no destination symbol: only the origin symbol is the necessary to cache -- we have a set of names for a given symbol. Our previous type analysis had a type cache, but this type cache only lasted one type traversal. The algorihtm was very pessimistic -- we cleared the `visited` cache with `reinitializeVisited` after every traversal so that members would be correctly recognized if the origin symbol changed. However, the origin symbol usually stays the same, especially when traversing bodies of methods and variables, which contain a high proportion of types. Taking this into account, we arrive to the conclusion that we can keep type caches around as long as the `currentOwner` doesn't change, because dependencies are only registered for top-level classes in both cases (`ExtractAPI` and `Dependency`). The introduced solution allows every phase to implement their own `TypeTraverser` and override the function that takes care of adding a dependency. This is necessary because the functions to add dependencies depend on the context (origin symbols and more stuff), which ultimately varies in `ExtractAPI` and `Dependency`. The following benchmark has been obtained by the same formula as the commit mentioned before, and benchmarks the compilation of the Scala standard library. BEFORE ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_b9131bfb sample 18 21228.771 ± 521.207 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 20199.768 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 21256.733 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 21931.177 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 18 284.115 ± 6.036 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 18 6474818679.556 ± 42551265.360 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 18 283.385 ± 23.147 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 18 6455703779.556 ± 483463770.519 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 18 12.857 ± 12.406 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 18 297978002.222 ± 287556197.389 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 18 6.901 ± 2.092 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 18 158212212.444 ± 50375116.805 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 18 105.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 18 21814.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_b9131bfb sample 3 55924.053 ± 16257.754 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 54895.051 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 3 117.417 ± 27.439 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 3 6999695530.667 ± 608845574.720 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 3 111.263 ± 90.263 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 3 6633605792.000 ± 5698534573.516 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 3 0.001 ± 0.040 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 3 74741.333 ± 2361755.471 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 3 2.478 ± 7.592 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 3 147881869.333 ± 475964254.946 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 3 9581.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_b9131bfb ss 10 45562.453 ± 836.977 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb ss 10 147.126 ± 2.229 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb ss 10 7163351651.200 ± 57993163.779 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb ss 10 137.407 ± 6.810 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb ss 10 6692512710.400 ± 429243418.572 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb ss 10 2.647 ± 0.168 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb ss 10 128840603.200 ± 7324571.862 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb ss 10 245.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb ss 10 29462.000 ms [success] Total time: 1595 s, completed Feb 26, 2017 1:42:55 AM [success] Total time: 0 s, completed Feb 26, 2017 1:42:55 AM ``` AFTER ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_c8a4806b sample 18 20757.144 ± 519.221 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 19931.333 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 20786.971 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 21615.765 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 18 290.476 ± 7.069 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 18 6476081869.778 ± 18700713.424 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 18 290.409 ± 20.336 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 18 6478102528.000 ± 468310673.653 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 18 13.261 ± 12.790 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 18 301324965.333 ± 290518111.715 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 18 6.735 ± 2.338 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 18 150953349.778 ± 54074639.209 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 18 101.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 18 21267.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_c8a4806b sample 3 54380.549 ± 24064.367 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 53552.873 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 53687.091 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 3 120.159 ± 52.914 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 3 6963979373.333 ± 137408036.138 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 3 113.755 ± 135.915 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 3 6588595392.000 ± 5170161565.753 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 3 0.002 ± 0.048 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 3 90400.000 ± 2856554.534 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 3 2.623 ± 7.378 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 3 151896768.000 ± 399915676.894 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 3 10070.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_c8a4806b ss 10 45613.670 ± 1724.291 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b ss 10 147.106 ± 4.973 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b ss 10 7165665000.000 ± 68500786.134 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b ss 10 138.633 ± 12.612 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b ss 10 6749057403.200 ± 438983252.418 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b ss 10 2.716 ± 0.298 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b ss 10 132216236.800 ± 11751803.094 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b ss 10 247.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b ss 10 29965.000 ms [success] Total time: 1593 s, completed Feb 26, 2017 11:54:01 AM [success] Total time: 0 s, completed Feb 26, 2017 11:54:01 AM ``` Machine info: ``` jvican in /data/rw/code/scala/zinc [22:24:47] > $ uname -a [±as-seen-from ●▴▾] Linux tribox 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux jvican in /data/rw/code/scala/zinc [23:15:57] > $ cpupower frequency-info [±as-seen-from ●▴▾] analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 3.40 GHz available cpufreq governors: performance powersave current policy: frequency should be within 3.20 GHz and 3.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.32 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes jvican in /data/rw/code/scala/zinc [23:16:14] > $ cat /proc/meminfo [±as-seen-from ●▴▾] MemTotal: 20430508 kB MemFree: 9890712 kB MemAvailable: 13490908 kB Buffers: 3684 kB Cached: 4052520 kB SwapCached: 0 kB Active: 7831612 kB Inactive: 2337220 kB Active(anon): 6214680 kB Inactive(anon): 151436 kB Active(file): 1616932 kB Inactive(file): 2185784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 12582908 kB SwapFree: 12582908 kB Dirty: 124 kB Writeback: 0 kB AnonPages: 6099876 kB Mapped: 183096 kB Shmem: 253488 kB Slab: 227436 kB SReclaimable: 152144 kB SUnreclaim: 75292 kB KernelStack: 5152 kB PageTables: 19636 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22798160 kB Committed_AS: 7685996 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 5511168 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 136620 kB DirectMap2M: 4970496 kB DirectMap1G: 15728640 kB jvican in /data/rw/code/scala/zinc [23:16:41] > $ cat /proc/cpuinfo [±as-seen-from ●▴▾] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3297.827 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3296.459 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.22 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3399.853 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5621.16 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3210.327 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.33 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ``` In comparison with df30872, the new changes improve the running time of Zinc by half a second in hot and warm benchmarks, and a decrease of 100ms for cold benchmarks, which seems to be product of the variation given the number of ms/op. It is a success taking into account that now we're traversing more types and symbols than before, so these changes allow us to do more work and still decrease the running time of Zinc. These changes are likely to have a bigger effect on huge industrial codebases in which the ratio of types is very high, and with a lot of rich types like poly types, method types, refinements and existential types that have lots of constraints.
jvican
referenced
this pull request
in scalacenter/zinc
Feb 26, 2017
This commit takes care of speeding up analysis of type dependencies as much as possible. In both `ExtractUsedNames` and `Dependency`, we have a cache function associated with a source symbol. This source symbol is the "key" of the cache in the sense that from it we detect how a dependency should be tracked. `Dependency`, for instance, adds a dependency from `X` to `Y`, where X is the origin symbol and `Y` is the destination symbol. However, only `X` determines how to a dependency should be added (and on which data structure). The same happens for `ExtractAPI`, but whose case is simpler because there is no destination symbol: only the origin symbol is the necessary to cache -- we have a set of names for a given symbol. Our previous type analysis had a type cache, but this type cache only lasted one type traversal. The algorihtm was very pessimistic -- we cleared the `visited` cache with `reinitializeVisited` after every traversal so that members would be correctly recognized if the origin symbol changed. However, the origin symbol usually stays the same, especially when traversing bodies of methods and variables, which contain a high proportion of types. Taking this into account, we arrive to the conclusion that we can keep type caches around as long as the `currentOwner` doesn't change, because dependencies are only registered for top-level classes in both cases (`ExtractAPI` and `Dependency`). The introduced solution allows every phase to implement their own `TypeTraverser` and override the function that takes care of adding a dependency. This is necessary because the functions to add dependencies depend on the context (origin symbols and more stuff), which ultimately varies in `ExtractAPI` and `Dependency`. The following benchmark has been obtained by the same formula as the commit mentioned before, and benchmarks the compilation of the Scala standard library. BEFORE ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_b9131bfb sample 18 21228.771 ± 521.207 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 20199.768 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 21256.733 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 21931.177 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 18 284.115 ± 6.036 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 18 6474818679.556 ± 42551265.360 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 18 283.385 ± 23.147 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 18 6455703779.556 ± 483463770.519 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 18 12.857 ± 12.406 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 18 297978002.222 ± 287556197.389 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 18 6.901 ± 2.092 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 18 158212212.444 ± 50375116.805 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 18 105.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 18 21814.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_b9131bfb sample 3 55924.053 ± 16257.754 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 54895.051 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 3 117.417 ± 27.439 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 3 6999695530.667 ± 608845574.720 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 3 111.263 ± 90.263 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 3 6633605792.000 ± 5698534573.516 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 3 0.001 ± 0.040 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 3 74741.333 ± 2361755.471 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 3 2.478 ± 7.592 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 3 147881869.333 ± 475964254.946 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 3 9581.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_b9131bfb ss 10 45562.453 ± 836.977 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb ss 10 147.126 ± 2.229 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb ss 10 7163351651.200 ± 57993163.779 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb ss 10 137.407 ± 6.810 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb ss 10 6692512710.400 ± 429243418.572 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb ss 10 2.647 ± 0.168 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb ss 10 128840603.200 ± 7324571.862 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb ss 10 245.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb ss 10 29462.000 ms [success] Total time: 1595 s, completed Feb 26, 2017 1:42:55 AM [success] Total time: 0 s, completed Feb 26, 2017 1:42:55 AM ``` AFTER ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_c8a4806b sample 18 20757.144 ± 519.221 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 19931.333 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 20786.971 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 21615.765 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 18 290.476 ± 7.069 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 18 6476081869.778 ± 18700713.424 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 18 290.409 ± 20.336 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 18 6478102528.000 ± 468310673.653 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 18 13.261 ± 12.790 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 18 301324965.333 ± 290518111.715 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 18 6.735 ± 2.338 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 18 150953349.778 ± 54074639.209 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 18 101.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 18 21267.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_c8a4806b sample 3 54380.549 ± 24064.367 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 53552.873 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 53687.091 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 3 120.159 ± 52.914 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 3 6963979373.333 ± 137408036.138 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 3 113.755 ± 135.915 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 3 6588595392.000 ± 5170161565.753 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 3 0.002 ± 0.048 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 3 90400.000 ± 2856554.534 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 3 2.623 ± 7.378 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 3 151896768.000 ± 399915676.894 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 3 10070.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_c8a4806b ss 10 45613.670 ± 1724.291 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b ss 10 147.106 ± 4.973 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b ss 10 7165665000.000 ± 68500786.134 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b ss 10 138.633 ± 12.612 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b ss 10 6749057403.200 ± 438983252.418 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b ss 10 2.716 ± 0.298 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b ss 10 132216236.800 ± 11751803.094 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b ss 10 247.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b ss 10 29965.000 ms [success] Total time: 1593 s, completed Feb 26, 2017 11:54:01 AM [success] Total time: 0 s, completed Feb 26, 2017 11:54:01 AM ``` Machine info: ``` jvican in /data/rw/code/scala/zinc [22:24:47] > $ uname -a [±as-seen-from ●▴▾] Linux tribox 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux jvican in /data/rw/code/scala/zinc [23:15:57] > $ cpupower frequency-info [±as-seen-from ●▴▾] analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 3.40 GHz available cpufreq governors: performance powersave current policy: frequency should be within 3.20 GHz and 3.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.32 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes jvican in /data/rw/code/scala/zinc [23:16:14] > $ cat /proc/meminfo [±as-seen-from ●▴▾] MemTotal: 20430508 kB MemFree: 9890712 kB MemAvailable: 13490908 kB Buffers: 3684 kB Cached: 4052520 kB SwapCached: 0 kB Active: 7831612 kB Inactive: 2337220 kB Active(anon): 6214680 kB Inactive(anon): 151436 kB Active(file): 1616932 kB Inactive(file): 2185784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 12582908 kB SwapFree: 12582908 kB Dirty: 124 kB Writeback: 0 kB AnonPages: 6099876 kB Mapped: 183096 kB Shmem: 253488 kB Slab: 227436 kB SReclaimable: 152144 kB SUnreclaim: 75292 kB KernelStack: 5152 kB PageTables: 19636 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22798160 kB Committed_AS: 7685996 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 5511168 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 136620 kB DirectMap2M: 4970496 kB DirectMap1G: 15728640 kB jvican in /data/rw/code/scala/zinc [23:16:41] > $ cat /proc/cpuinfo [±as-seen-from ●▴▾] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3297.827 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3296.459 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.22 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3399.853 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5621.16 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3210.327 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.33 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ``` In comparison with df30872, the new changes improve the running time of Zinc by half a second in hot and warm benchmarks, and a decrease of 100ms for cold benchmarks, which seems to be product of the variation given the number of ms/op. It is a success taking into account that now we're traversing more types and symbols than before, so these changes allow us to do more work and still decrease the running time of Zinc. These changes are likely to have a bigger effect on huge industrial codebases in which the ratio of types is very high, and with a lot of rich types like poly types, method types, refinements and existential types that have lots of constraints.
jvican
referenced
this pull request
in scalacenter/zinc
Feb 26, 2017
This commit takes care of speeding up analysis of type dependencies as much as possible. In both `ExtractUsedNames` and `Dependency`, we have a cache function associated with a source symbol. This source symbol is the "key" of the cache in the sense that from it we detect how a dependency should be tracked. `Dependency`, for instance, adds a dependency from `X` to `Y`, where X is the origin symbol and `Y` is the destination symbol. However, only `X` determines how to a dependency should be added (and on which data structure). The same happens for `ExtractAPI`, but whose case is simpler because there is no destination symbol: only the origin symbol is the necessary to cache -- we have a set of names for a given symbol. Our previous type analysis had a type cache, but this type cache only lasted one type traversal. The algorihtm was very pessimistic -- we cleared the `visited` cache with `reinitializeVisited` after every traversal so that members would be correctly recognized if the origin symbol changed. However, the origin symbol usually stays the same, especially when traversing bodies of methods and variables, which contain a high proportion of types. Taking this into account, we arrive to the conclusion that we can keep type caches around as long as the `currentOwner` doesn't change, because dependencies are only registered for top-level classes in both cases (`ExtractAPI` and `Dependency`). The introduced solution allows every phase to implement their own `TypeTraverser` and override the function that takes care of adding a dependency. This is necessary because the functions to add dependencies depend on the context (origin symbols and more stuff), which ultimately varies in `ExtractAPI` and `Dependency`. The following benchmark has been obtained by the same formula as the commit mentioned before, and benchmarks the compilation of the Scala standard library. BEFORE ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_b9131bfb sample 18 21228.771 ± 521.207 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 20199.768 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 21256.733 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 21931.177 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 18 284.115 ± 6.036 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 18 6474818679.556 ± 42551265.360 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 18 283.385 ± 23.147 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 18 6455703779.556 ± 483463770.519 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 18 12.857 ± 12.406 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 18 297978002.222 ± 287556197.389 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 18 6.901 ± 2.092 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 18 158212212.444 ± 50375116.805 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 18 105.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 18 21814.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_b9131bfb sample 3 55924.053 ± 16257.754 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 54895.051 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 3 117.417 ± 27.439 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 3 6999695530.667 ± 608845574.720 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 3 111.263 ± 90.263 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 3 6633605792.000 ± 5698534573.516 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 3 0.001 ± 0.040 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 3 74741.333 ± 2361755.471 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 3 2.478 ± 7.592 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 3 147881869.333 ± 475964254.946 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 3 9581.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_b9131bfb ss 10 45562.453 ± 836.977 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb ss 10 147.126 ± 2.229 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb ss 10 7163351651.200 ± 57993163.779 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb ss 10 137.407 ± 6.810 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb ss 10 6692512710.400 ± 429243418.572 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb ss 10 2.647 ± 0.168 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb ss 10 128840603.200 ± 7324571.862 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb ss 10 245.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb ss 10 29462.000 ms [success] Total time: 1595 s, completed Feb 26, 2017 1:42:55 AM [success] Total time: 0 s, completed Feb 26, 2017 1:42:55 AM ``` AFTER ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_c8a4806b sample 18 20757.144 ± 519.221 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 19931.333 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 20786.971 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 21615.765 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 18 290.476 ± 7.069 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 18 6476081869.778 ± 18700713.424 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 18 290.409 ± 20.336 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 18 6478102528.000 ± 468310673.653 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 18 13.261 ± 12.790 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 18 301324965.333 ± 290518111.715 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 18 6.735 ± 2.338 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 18 150953349.778 ± 54074639.209 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 18 101.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 18 21267.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_c8a4806b sample 3 54380.549 ± 24064.367 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 53552.873 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 53687.091 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 3 120.159 ± 52.914 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 3 6963979373.333 ± 137408036.138 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 3 113.755 ± 135.915 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 3 6588595392.000 ± 5170161565.753 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 3 0.002 ± 0.048 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 3 90400.000 ± 2856554.534 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 3 2.623 ± 7.378 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 3 151896768.000 ± 399915676.894 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 3 10070.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_c8a4806b ss 10 45613.670 ± 1724.291 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b ss 10 147.106 ± 4.973 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b ss 10 7165665000.000 ± 68500786.134 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b ss 10 138.633 ± 12.612 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b ss 10 6749057403.200 ± 438983252.418 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b ss 10 2.716 ± 0.298 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b ss 10 132216236.800 ± 11751803.094 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b ss 10 247.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b ss 10 29965.000 ms [success] Total time: 1593 s, completed Feb 26, 2017 11:54:01 AM [success] Total time: 0 s, completed Feb 26, 2017 11:54:01 AM ``` Machine info: ``` jvican in /data/rw/code/scala/zinc [22:24:47] > $ uname -a [±as-seen-from ●▴▾] Linux tribox 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux jvican in /data/rw/code/scala/zinc [23:15:57] > $ cpupower frequency-info [±as-seen-from ●▴▾] analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 3.40 GHz available cpufreq governors: performance powersave current policy: frequency should be within 3.20 GHz and 3.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.32 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes jvican in /data/rw/code/scala/zinc [23:16:14] > $ cat /proc/meminfo [±as-seen-from ●▴▾] MemTotal: 20430508 kB MemFree: 9890712 kB MemAvailable: 13490908 kB Buffers: 3684 kB Cached: 4052520 kB SwapCached: 0 kB Active: 7831612 kB Inactive: 2337220 kB Active(anon): 6214680 kB Inactive(anon): 151436 kB Active(file): 1616932 kB Inactive(file): 2185784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 12582908 kB SwapFree: 12582908 kB Dirty: 124 kB Writeback: 0 kB AnonPages: 6099876 kB Mapped: 183096 kB Shmem: 253488 kB Slab: 227436 kB SReclaimable: 152144 kB SUnreclaim: 75292 kB KernelStack: 5152 kB PageTables: 19636 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22798160 kB Committed_AS: 7685996 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 5511168 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 136620 kB DirectMap2M: 4970496 kB DirectMap1G: 15728640 kB jvican in /data/rw/code/scala/zinc [23:16:41] > $ cat /proc/cpuinfo [±as-seen-from ●▴▾] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3297.827 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3296.459 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.22 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3399.853 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5621.16 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3210.327 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.33 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ``` In comparison with df30872, the new changes improve the running time of Zinc by half a second in hot and warm benchmarks, and a decrease of 100ms for cold benchmarks, which seems to be product of the variation given the number of ms/op. It is a success taking into account that now we're traversing more types and symbols than before, so these changes allow us to do more work and still decrease the running time of Zinc. These changes are likely to have a bigger effect on huge industrial codebases in which the ratio of types is very high, and with a lot of rich types like poly types, method types, refinements and existential types that have lots of constraints.
jvican
referenced
this pull request
in scalacenter/zinc
Feb 26, 2017
This commit takes care of speeding up analysis of type dependencies as much as possible. In both `ExtractUsedNames` and `Dependency`, we have a cache function associated with a source symbol. This source symbol is the "key" of the cache in the sense that from it we detect how a dependency should be tracked. `Dependency`, for instance, adds a dependency from `X` to `Y`, where X is the origin symbol and `Y` is the destination symbol. However, only `X` determines how to a dependency should be added (and on which data structure). The same happens for `ExtractAPI`, but whose case is simpler because there is no destination symbol: only the origin symbol is the necessary to cache -- we have a set of names for a given symbol. Our previous type analysis had a type cache, but this type cache only lasted one type traversal. The algorihtm was very pessimistic -- we cleared the `visited` cache with `reinitializeVisited` after every traversal so that members would be correctly recognized if the origin symbol changed. However, the origin symbol usually stays the same, especially when traversing bodies of methods and variables, which contain a high proportion of types. Taking this into account, we arrive to the conclusion that we can keep type caches around as long as the `currentOwner` doesn't change, because dependencies are only registered for top-level classes in both cases (`ExtractAPI` and `Dependency`). The introduced solution allows every phase to implement their own `TypeTraverser` and override the function that takes care of adding a dependency. This is necessary because the functions to add dependencies depend on the context (origin symbols and more stuff), which ultimately varies in `ExtractAPI` and `Dependency`. The following benchmark has been obtained by the same formula as the commit mentioned before, and benchmarks the compilation of the Scala standard library. BEFORE ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_b9131bfb sample 18 21228.771 ± 521.207 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 20199.768 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 21256.733 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 21931.177 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 18 284.115 ± 6.036 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 18 6474818679.556 ± 42551265.360 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 18 283.385 ± 23.147 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 18 6455703779.556 ± 483463770.519 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 18 12.857 ± 12.406 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 18 297978002.222 ± 287556197.389 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 18 6.901 ± 2.092 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 18 158212212.444 ± 50375116.805 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 18 105.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 18 21814.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_b9131bfb sample 3 55924.053 ± 16257.754 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 54895.051 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 3 117.417 ± 27.439 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 3 6999695530.667 ± 608845574.720 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 3 111.263 ± 90.263 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 3 6633605792.000 ± 5698534573.516 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 3 0.001 ± 0.040 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 3 74741.333 ± 2361755.471 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 3 2.478 ± 7.592 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 3 147881869.333 ± 475964254.946 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 3 9581.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_b9131bfb ss 10 45562.453 ± 836.977 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb ss 10 147.126 ± 2.229 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb ss 10 7163351651.200 ± 57993163.779 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb ss 10 137.407 ± 6.810 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb ss 10 6692512710.400 ± 429243418.572 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb ss 10 2.647 ± 0.168 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb ss 10 128840603.200 ± 7324571.862 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb ss 10 245.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb ss 10 29462.000 ms [success] Total time: 1595 s, completed Feb 26, 2017 1:42:55 AM [success] Total time: 0 s, completed Feb 26, 2017 1:42:55 AM ``` AFTER ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_c8a4806b sample 18 20757.144 ± 519.221 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 19931.333 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 20786.971 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 21615.765 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 18 290.476 ± 7.069 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 18 6476081869.778 ± 18700713.424 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 18 290.409 ± 20.336 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 18 6478102528.000 ± 468310673.653 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 18 13.261 ± 12.790 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 18 301324965.333 ± 290518111.715 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 18 6.735 ± 2.338 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 18 150953349.778 ± 54074639.209 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 18 101.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 18 21267.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_c8a4806b sample 3 54380.549 ± 24064.367 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 53552.873 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 53687.091 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 3 120.159 ± 52.914 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 3 6963979373.333 ± 137408036.138 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 3 113.755 ± 135.915 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 3 6588595392.000 ± 5170161565.753 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 3 0.002 ± 0.048 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 3 90400.000 ± 2856554.534 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 3 2.623 ± 7.378 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 3 151896768.000 ± 399915676.894 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 3 10070.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_c8a4806b ss 10 45613.670 ± 1724.291 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b ss 10 147.106 ± 4.973 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b ss 10 7165665000.000 ± 68500786.134 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b ss 10 138.633 ± 12.612 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b ss 10 6749057403.200 ± 438983252.418 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b ss 10 2.716 ± 0.298 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b ss 10 132216236.800 ± 11751803.094 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b ss 10 247.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b ss 10 29965.000 ms [success] Total time: 1593 s, completed Feb 26, 2017 11:54:01 AM [success] Total time: 0 s, completed Feb 26, 2017 11:54:01 AM ``` Machine info: ``` jvican in /data/rw/code/scala/zinc [22:24:47] > $ uname -a [±as-seen-from ●▴▾] Linux tribox 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux jvican in /data/rw/code/scala/zinc [23:15:57] > $ cpupower frequency-info [±as-seen-from ●▴▾] analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 3.40 GHz available cpufreq governors: performance powersave current policy: frequency should be within 3.20 GHz and 3.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.32 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes jvican in /data/rw/code/scala/zinc [23:16:14] > $ cat /proc/meminfo [±as-seen-from ●▴▾] MemTotal: 20430508 kB MemFree: 9890712 kB MemAvailable: 13490908 kB Buffers: 3684 kB Cached: 4052520 kB SwapCached: 0 kB Active: 7831612 kB Inactive: 2337220 kB Active(anon): 6214680 kB Inactive(anon): 151436 kB Active(file): 1616932 kB Inactive(file): 2185784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 12582908 kB SwapFree: 12582908 kB Dirty: 124 kB Writeback: 0 kB AnonPages: 6099876 kB Mapped: 183096 kB Shmem: 253488 kB Slab: 227436 kB SReclaimable: 152144 kB SUnreclaim: 75292 kB KernelStack: 5152 kB PageTables: 19636 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22798160 kB Committed_AS: 7685996 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 5511168 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 136620 kB DirectMap2M: 4970496 kB DirectMap1G: 15728640 kB jvican in /data/rw/code/scala/zinc [23:16:41] > $ cat /proc/cpuinfo [±as-seen-from ●▴▾] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3297.827 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3296.459 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.22 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3399.853 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5621.16 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3210.327 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.33 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ``` In comparison with df30872, the new changes improve the running time of Zinc by half a second in hot and warm benchmarks, and a decrease of 100ms for cold benchmarks, which seems to be product of the variation given the number of ms/op. It is a success taking into account that now we're traversing more types and symbols than before, so these changes allow us to do more work and still decrease the running time of Zinc. These changes are likely to have a bigger effect on huge industrial codebases in which the ratio of types is very high, and with a lot of rich types like poly types, method types, refinements and existential types that have lots of constraints.
jvican
referenced
this pull request
in scalacenter/zinc
Feb 26, 2017
This commit takes care of speeding up analysis of type dependencies as much as possible. In both `ExtractUsedNames` and `Dependency`, we have a cache function associated with a source symbol. This source symbol is the "key" of the cache in the sense that from it we detect how a dependency should be tracked. `Dependency`, for instance, adds a dependency from `X` to `Y`, where X is the origin symbol and `Y` is the destination symbol. However, only `X` determines how to a dependency should be added (and on which data structure). The same happens for `ExtractAPI`, but whose case is simpler because there is no destination symbol: only the origin symbol is the necessary to cache -- we have a set of names for a given symbol. Our previous type analysis had a type cache, but this type cache only lasted one type traversal. The algorihtm was very pessimistic -- we cleared the `visited` cache with `reinitializeVisited` after every traversal so that members would be correctly recognized if the origin symbol changed. However, the origin symbol usually stays the same, especially when traversing bodies of methods and variables, which contain a high proportion of types. Taking this into account, we arrive to the conclusion that we can keep type caches around as long as the `currentOwner` doesn't change, because dependencies are only registered for top-level classes in both cases (`ExtractAPI` and `Dependency`). The introduced solution allows every phase to implement their own `TypeTraverser` and override the function that takes care of adding a dependency. This is necessary because the functions to add dependencies depend on the context (origin symbols and more stuff), which ultimately varies in `ExtractAPI` and `Dependency`. The following benchmark has been obtained by the same formula as the commit mentioned before, and benchmarks the compilation of the Scala standard library. BEFORE ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_b9131bfb sample 18 21228.771 ± 521.207 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 20199.768 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 21256.733 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 21931.177 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 22112.371 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 18 284.115 ± 6.036 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 18 6474818679.556 ± 42551265.360 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 18 283.385 ± 23.147 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 18 6455703779.556 ± 483463770.519 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 18 12.857 ± 12.406 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 18 297978002.222 ± 287556197.389 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 18 6.901 ± 2.092 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 18 158212212.444 ± 50375116.805 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 18 105.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 18 21814.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_b9131bfb sample 3 55924.053 ± 16257.754 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_b9131bfb sample 54895.051 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_b9131bfb sample 56438.555 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb sample 3 117.417 ± 27.439 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb sample 3 6999695530.667 ± 608845574.720 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb sample 3 111.263 ± 90.263 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb sample 3 6633605792.000 ± 5698534573.516 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_b9131bfb sample 3 0.001 ± 0.040 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_b9131bfb sample 3 74741.333 ± 2361755.471 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb sample 3 2.478 ± 7.592 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb sample 3 147881869.333 ± 475964254.946 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb sample 3 9581.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_b9131bfb ss 10 45562.453 ± 836.977 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_b9131bfb ss 10 147.126 ± 2.229 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_b9131bfb ss 10 7163351651.200 ± 57993163.779 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_b9131bfb ss 10 137.407 ± 6.810 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_b9131bfb ss 10 6692512710.400 ± 429243418.572 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_b9131bfb ss 10 2.647 ± 0.168 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_b9131bfb ss 10 128840603.200 ± 7324571.862 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_b9131bfb ss 10 245.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_b9131bfb ss 10 29462.000 ms [success] Total time: 1595 s, completed Feb 26, 2017 1:42:55 AM [success] Total time: 0 s, completed Feb 26, 2017 1:42:55 AM ``` AFTER ``` [info] Benchmark (_tempDir) Mode Cnt Score Error Units [info] HotScalacBenchmark.compile /tmp/sbt_c8a4806b sample 18 20757.144 ± 519.221 ms/op [info] HotScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 19931.333 ms/op [info] HotScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 20786.971 ms/op [info] HotScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 21615.765 ms/op [info] HotScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 21676.163 ms/op [info] HotScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 18 290.476 ± 7.069 MB/sec [info] HotScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 18 6476081869.778 ± 18700713.424 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 18 290.409 ± 20.336 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 18 6478102528.000 ± 468310673.653 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 18 13.261 ± 12.790 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 18 301324965.333 ± 290518111.715 B/op [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 18 6.735 ± 2.338 MB/sec [info] HotScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 18 150953349.778 ± 54074639.209 B/op [info] HotScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 18 101.000 counts [info] HotScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 18 21267.000 ms [info] WarmScalacBenchmark.compile /tmp/sbt_c8a4806b sample 3 54380.549 ± 24064.367 ms/op [info] WarmScalacBenchmark.compile:compile·p0.00 /tmp/sbt_c8a4806b sample 53552.873 ms/op [info] WarmScalacBenchmark.compile:compile·p0.50 /tmp/sbt_c8a4806b sample 53687.091 ms/op [info] WarmScalacBenchmark.compile:compile·p0.90 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.95 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.99 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p0.9999 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:compile·p1.00 /tmp/sbt_c8a4806b sample 55901.684 ms/op [info] WarmScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b sample 3 120.159 ± 52.914 MB/sec [info] WarmScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b sample 3 6963979373.333 ± 137408036.138 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b sample 3 113.755 ± 135.915 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b sample 3 6588595392.000 ± 5170161565.753 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen /tmp/sbt_c8a4806b sample 3 0.002 ± 0.048 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Old_Gen.norm /tmp/sbt_c8a4806b sample 3 90400.000 ± 2856554.534 B/op [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b sample 3 2.623 ± 7.378 MB/sec [info] WarmScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b sample 3 151896768.000 ± 399915676.894 B/op [info] WarmScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b sample 3 73.000 counts [info] WarmScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b sample 3 10070.000 ms [info] ColdScalacBenchmark.compile /tmp/sbt_c8a4806b ss 10 45613.670 ± 1724.291 ms/op [info] ColdScalacBenchmark.compile:·gc.alloc.rate /tmp/sbt_c8a4806b ss 10 147.106 ± 4.973 MB/sec [info] ColdScalacBenchmark.compile:·gc.alloc.rate.norm /tmp/sbt_c8a4806b ss 10 7165665000.000 ± 68500786.134 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space /tmp/sbt_c8a4806b ss 10 138.633 ± 12.612 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Eden_Space.norm /tmp/sbt_c8a4806b ss 10 6749057403.200 ± 438983252.418 B/op [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space /tmp/sbt_c8a4806b ss 10 2.716 ± 0.298 MB/sec [info] ColdScalacBenchmark.compile:·gc.churn.PS_Survivor_Space.norm /tmp/sbt_c8a4806b ss 10 132216236.800 ± 11751803.094 B/op [info] ColdScalacBenchmark.compile:·gc.count /tmp/sbt_c8a4806b ss 10 247.000 counts [info] ColdScalacBenchmark.compile:·gc.time /tmp/sbt_c8a4806b ss 10 29965.000 ms [success] Total time: 1593 s, completed Feb 26, 2017 11:54:01 AM [success] Total time: 0 s, completed Feb 26, 2017 11:54:01 AM ``` Machine info: ``` jvican in /data/rw/code/scala/zinc [22:24:47] > $ uname -a [±as-seen-from ●▴▾] Linux tribox 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux jvican in /data/rw/code/scala/zinc [23:15:57] > $ cpupower frequency-info [±as-seen-from ●▴▾] analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 3.40 GHz available cpufreq governors: performance powersave current policy: frequency should be within 3.20 GHz and 3.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.32 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes jvican in /data/rw/code/scala/zinc [23:16:14] > $ cat /proc/meminfo [±as-seen-from ●▴▾] MemTotal: 20430508 kB MemFree: 9890712 kB MemAvailable: 13490908 kB Buffers: 3684 kB Cached: 4052520 kB SwapCached: 0 kB Active: 7831612 kB Inactive: 2337220 kB Active(anon): 6214680 kB Inactive(anon): 151436 kB Active(file): 1616932 kB Inactive(file): 2185784 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 12582908 kB SwapFree: 12582908 kB Dirty: 124 kB Writeback: 0 kB AnonPages: 6099876 kB Mapped: 183096 kB Shmem: 253488 kB Slab: 227436 kB SReclaimable: 152144 kB SUnreclaim: 75292 kB KernelStack: 5152 kB PageTables: 19636 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22798160 kB Committed_AS: 7685996 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 5511168 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 136620 kB DirectMap2M: 4970496 kB DirectMap1G: 15728640 kB jvican in /data/rw/code/scala/zinc [23:16:41] > $ cat /proc/cpuinfo [±as-seen-from ●▴▾] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3297.827 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3296.459 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.22 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3399.853 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5621.16 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0x88 cpu MHz : 3210.327 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5620.33 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ``` In comparison with df30872, the new changes improve the running time of Zinc by half a second in hot and warm benchmarks, and a decrease of 100ms for cold benchmarks, which seems to be product of the variation given the number of ms/op. It is a success taking into account that now we're traversing more types and symbols than before, so these changes allow us to do more work and still decrease the running time of Zinc. These changes are likely to have a bigger effect on huge industrial codebases in which the ratio of types is very high, and with a lot of rich types like poly types, method types, refinements and existential types that have lots of constraints.
jvican
added a commit
that referenced
this pull request
Mar 3, 2017
eed3si9n
referenced
this pull request
in eed3si9n/zinc
Dec 15, 2017
Use IO.getModified over importing the method
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Now it compiles on my machine, given a previously locally-published version of
ivyProj
that contains the changes introduced by sbt/sbt#2106. This is due to this repository containing the changes made toComponentCompiler
. These changes depend on changes made toivyProj
, which is not included in this repository, and that have not been published as of 0.13.9-M3. (see here in the diff)What should I do about these changes? Roll them back until
ivyProj
get republished with the changes introduced in sbt/sbt#2106?