-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix enode memoization #238
Conversation
…mporarily. This breaks unit tests as it uncovers bugs in node memoization.
… haskey() and getindex()).
…d unit tests pass.
New benchmark results: julia> run(SUITE)
5-element BenchmarkTools.BenchmarkGroup:
tags: []
"prop_logic" => 4-element BenchmarkTools.BenchmarkGroup:
tags: ["egraph", "logic"]
"freges_theorem" => Trial(1.023 ms)
"prove1" => Trial(33.271 ms)
"demorgan" => Trial(90.200 μs)
"rewrite" => Trial(111.400 μs)
"basic_maths" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["egraphs"]
"simpl2" => Trial(17.517 ms)
"simpl1" => Trial(4.194 ms)
"while_superinterpreter" => 1-element BenchmarkTools.BenchmarkGroup:
tags: ["egraph"]
"while_10" => Trial(17.129 ms)
"egraph" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["egraphs"]
"addexpr" => Trial(4.934 ms)
"constructor" => Trial(500.000 ns)
"calc_logic" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["egraph", "logic"]
"freges_theorem" => Trial(32.804 ms)
"demorgan" => Trial(75.000 μs) Compare to #225 (comment) (on the same machine). Note that runtime has been improved. Let me know if there is a more comprehensive benchmark set that I should try to see whether caching of node hash values is beneficial. |
While the unit tests pass (with check_memo active) indicating that the bug is fixed, I still need to double check that enodes are not updated after they have been added to memo. |
Can you merge master to update the github actions conditions? It should be running the benchmarks against master and post them here |
v_unset_hash!(n) | ||
@label ret | ||
v_hash!(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the hash going to mutate as well? What is the difference from caching it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, the main issue is that VecExpr must not be updated after they have been added to memo. Caching of hash values is an independent concern.
I did a quick analysis, in which I checked how often the cached values are actually used and I saw only a small usage factor. I'll redo the analysis more carefully and post the result here.
The cached value can make up 15% to 20% of the memory required for VecExpr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the annotations in https://github.com/gkronber/Metatheory.jl/tree/count_vecexpr_hash_calls
(gkronber@593d9e2)
julia> using Metatheory
Precompiling Metatheory
1 dependency successfully precompiled in 2 seconds. 6 already precompiled.
julia> include("benchmark/benchmarks.jl")
Benchmark(evals=1, seconds=5.0, samples=10000)
julia> run(SUITE)
[...]
julia> Metatheory.VecExprModule.vexpr_created
12355173
julia> Metatheory.VecExprModule.v_copy_calls
5247759
julia> Metatheory.VecExprModule.v_new_calls
3975935
julia> Metatheory.VecExprModule.unset_hash_calls
41556639
julia> Metatheory.VecExprModule.hash_calls
93167599
julia> Metatheory.VecExprModule.cached_hash_computation
40712170
julia> Metatheory.VecExprModule.cached_hash_access
1205819
julia> Metatheory.EGraphs.memo_lookups
31998547
julia> Metatheory.EGraphs.memo_add
15708295
In this run of the benchmarks:
- 12mio
VecExpr
objects are constructed (some via v_new, some via copy, most via direct constructor calls) - 32mio lookups are done in memo (note that most of them call the hash function twice via
haskey(n) && g.memo[n]
). - 15mio times
g.memo[n]
is set (memo_add) - memo lookups and adding to memo cause 93mio calls to
hash(n::VecExpr)
v_hash!
is called 42mio times (40.7mio times the hash value is calculated, 1.2mio times the cached value is returned)- -> only a bit more than half of the hash calls can use the cached value
- if instead of
haskey(n) && g.memo[n]
we useget
we should be able to reduce the number of hash calls significantly (exact number to be added in a later edit). - improving memo lookups we have only 64mio calls to
hash(n::VecExpr)
. The other numbers are unchanged. We still need to calculate the hash 40.7mio times. (65346c7)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VecExpr must not be updated after they have been added to memo.
I think this happens as well in egg
and is part of the algorithm. We should ask the egg community how they are doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In egg they are careful to clone nodes before adding them to memo.
again, i should have fixed at least CI |
@gkronber so, after this change you made here, how is this situation handled? Now the hash should just be recomputed on-the-fly, if an enode (already in memo) is mutated, isn't its hash table location still going to be the old one? |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## ale/3.0 #238 +/- ##
==========================================
Coverage ? 79.70%
==========================================
Files ? 19
Lines ? 1488
Branches ? 0
==========================================
Hits ? 1186
Misses ? 302
Partials ? 0 ☔ View full report in Codecov by Sentry. |
on: | ||
pull_request: | ||
branches: | ||
- master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also changed in master i think
Hey hey I just looked into why the benchmark workflows are not working. Quote from here):
We could do the following (quote from here):
(I am not saying that we should do this here, just wanted to leave it in a comment so it does not get lost:) |
Thank you for carefully checking the changes. You are correct, my changes for removing the cached hash value are a separate concern. There seems to be a different issue that triggers the problem, that a node is added to memo and then changed later. Metatheory.jl/src/EGraphs/egraph.jl Line 435 in 7a8cb82
where an entry in g.memo is only updated (but not inserted if there is no existing one) in line 437. This differs from the corresponding code in egg https://github.com/egraphs-good/egg/blob/b3a53e9e575dc67ce1e8320e4e488f2873a67482/src/egraph.rs#L1390 where an entry is always created. I fixed this in 2677542. This change and 4d24031 seem to fix the problem (the unit tests pass with I think it would be best to create a new branch / PR which contains only the improvements / fixes for the lookup in g.memo. Afterwards we can test the effect of caching of hash values separatly. |
I started to work on this in https://github.com/gkronber/Metatheory.jl/tree/fix_enode_memo_2 |
Nice thanks! Waiting for the PR, and then I'll run the benchmarks |
Closing this PR because a part of it was re-done in #239. |
Metatheory uses caching of hash values for enodes (in VecExpr) presumably to reduce the number of hash value calculations. At the same time we find code such as
haskey(g.memo, n) ? g.memo[n] : 0
which does two lookups in thememo
dictionary where only one is necessary.Much more critical is that nodes are mutated after they are added added to memo, whereby the hash value is changed. This is a bug because the lookup of a node (
g.memo[n]
) will fail even if the node is contained in the dictionary. The reason is that the lookup will use the new hash value to find the node while the location of the node in the hashtable is based on the old hash value. This bug is detected bycheck_memo()
.This bug has been introduced by myself in #229 , but even before that the memoization was incorrect, because of the issue of hash collisions, and hash-values of updated nodes not begin updated in memo.
I believe it will be difficult, if not impossible, to combine caching of hash-values with mutable nodes in the memo dictionary. But I'm open for suggestions.