Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix enode memoization #238

Closed
wants to merge 11 commits into from

Conversation

gkronber
Copy link
Collaborator

Metatheory uses caching of hash values for enodes (in VecExpr) presumably to reduce the number of hash value calculations. At the same time we find code such as haskey(g.memo, n) ? g.memo[n] : 0 which does two lookups in the memo dictionary where only one is necessary.

Much more critical is that nodes are mutated after they are added added to memo, whereby the hash value is changed. This is a bug because the lookup of a node (g.memo[n]) will fail even if the node is contained in the dictionary. The reason is that the lookup will use the new hash value to find the node while the location of the node in the hashtable is based on the old hash value. This bug is detected by check_memo().

This bug has been introduced by myself in #229 , but even before that the memoization was incorrect, because of the issue of hash collisions, and hash-values of updated nodes not begin updated in memo.

I believe it will be difficult, if not impossible, to combine caching of hash-values with mutable nodes in the memo dictionary. But I'm open for suggestions.

@gkronber
Copy link
Collaborator Author

gkronber commented Aug 18, 2024

New benchmark results:

julia> run(SUITE)
5-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "prop_logic" => 4-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraph", "logic"]
          "freges_theorem" => Trial(1.023 ms)
          "prove1" => Trial(33.271 ms)
          "demorgan" => Trial(90.200 μs)
          "rewrite" => Trial(111.400 μs)
  "basic_maths" => 2-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraphs"]
          "simpl2" => Trial(17.517 ms)
          "simpl1" => Trial(4.194 ms)
  "while_superinterpreter" => 1-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraph"]
          "while_10" => Trial(17.129 ms)
  "egraph" => 2-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraphs"]
          "addexpr" => Trial(4.934 ms)
          "constructor" => Trial(500.000 ns)
  "calc_logic" => 2-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraph", "logic"]
          "freges_theorem" => Trial(32.804 ms)
          "demorgan" => Trial(75.000 μs)

Compare to #225 (comment) (on the same machine). Note that runtime has been improved.

Let me know if there is a more comprehensive benchmark set that I should try to see whether caching of node hash values is beneficial.

@gkronber
Copy link
Collaborator Author

gkronber commented Aug 18, 2024

While the unit tests pass (with check_memo active) indicating that the bug is fixed, I still need to double check that enodes are not updated after they have been added to memo.

@gkronber gkronber mentioned this pull request Aug 20, 2024
@0x0f0f0f
Copy link
Member

Can you merge master to update the github actions conditions? It should be running the benchmarks against master and post them here

Comment on lines -247 to -249
v_unset_hash!(n)
@label ret
v_hash!(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the hash going to mutate as well? What is the difference from caching it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, the main issue is that VecExpr must not be updated after they have been added to memo. Caching of hash values is an independent concern.
I did a quick analysis, in which I checked how often the cached values are actually used and I saw only a small usage factor. I'll redo the analysis more carefully and post the result here.

The cached value can make up 15% to 20% of the memory required for VecExpr.

Copy link
Collaborator Author

@gkronber gkronber Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the annotations in https://github.com/gkronber/Metatheory.jl/tree/count_vecexpr_hash_calls
(gkronber@593d9e2)

julia> using Metatheory
Precompiling Metatheory
  1 dependency successfully precompiled in 2 seconds. 6 already precompiled.

julia> include("benchmark/benchmarks.jl")
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(SUITE)
[...]

julia> Metatheory.VecExprModule.vexpr_created
12355173
julia> Metatheory.VecExprModule.v_copy_calls
5247759
julia> Metatheory.VecExprModule.v_new_calls
3975935
julia> Metatheory.VecExprModule.unset_hash_calls
41556639
julia> Metatheory.VecExprModule.hash_calls
93167599
julia> Metatheory.VecExprModule.cached_hash_computation
40712170
julia> Metatheory.VecExprModule.cached_hash_access
1205819
julia> Metatheory.EGraphs.memo_lookups
31998547
julia> Metatheory.EGraphs.memo_add
15708295

In this run of the benchmarks:

  • 12mio VecExpr objects are constructed (some via v_new, some via copy, most via direct constructor calls)
  • 32mio lookups are done in memo (note that most of them call the hash function twice via haskey(n) && g.memo[n]).
  • 15mio times g.memo[n] is set (memo_add)
  • memo lookups and adding to memo cause 93mio calls to hash(n::VecExpr)
  • v_hash! is called 42mio times (40.7mio times the hash value is calculated, 1.2mio times the cached value is returned)
  • -> only a bit more than half of the hash calls can use the cached value
  • if instead of haskey(n) && g.memo[n] we use get we should be able to reduce the number of hash calls significantly (exact number to be added in a later edit).
  • improving memo lookups we have only 64mio calls to hash(n::VecExpr). The other numbers are unchanged. We still need to calculate the hash 40.7mio times. (65346c7)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VecExpr must not be updated after they have been added to memo.

I think this happens as well in egg and is part of the algorithm. We should ask the egg community how they are doing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In egg they are careful to clone nodes before adding them to memo.

@0x0f0f0f
Copy link
Member

Can you merge master to update the github actions conditions? It should be running the benchmarks against master and post them here

again, i should have fixed at least CI

@0x0f0f0f
Copy link
Member

Much more critical is that nodes are mutated after they are added added to memo, whereby the hash value is changed. This is a bug because the lookup of a node (g.memo[n]) will fail even if the node is contained in the dictionary. The reason is that the lookup will use the new hash value to find the node while the location of the node in the hashtable is based on the old hash value. This bug is detected by check_memo()

@gkronber so, after this change you made here, how is this situation handled? Now the hash should just be recomputed on-the-fly, if an enode (already in memo) is mutated, isn't its hash table location still going to be the old one?

@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 86.38373% with 154 lines in your changes missing coverage. Please review.

Please upload report for BASE (ale/3.0@c43b0fb). Learn more about missing BASE report.

Files Patch % Lines
src/EGraphs/egraph.jl 82.14% 30 Missing ⚠️
src/EGraphs/Schedulers.jl 43.47% 26 Missing ⚠️
src/utils.jl 7.69% 24 Missing ⚠️
src/Syntax.jl 88.66% 17 Missing ⚠️
src/Patterns.jl 78.72% 10 Missing ⚠️
src/EGraphs/saturation.jl 95.17% 7 Missing ⚠️
src/Rules.jl 83.33% 7 Missing ⚠️
ext/Plotting.jl 0.00% 6 Missing ⚠️
src/Rewriters.jl 53.84% 6 Missing ⚠️
src/ematch_compiler.jl 95.83% 6 Missing ⚠️
... and 6 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             ale/3.0     #238   +/-   ##
==========================================
  Coverage           ?   79.70%           
==========================================
  Files              ?       19           
  Lines              ?     1488           
  Branches           ?        0           
==========================================
  Hits               ?     1186           
  Misses             ?      302           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines -3 to -6
on:
pull_request:
branches:
- master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also changed in master i think

@nmheim
Copy link

nmheim commented Aug 22, 2024

Hey hey I just looked into why the benchmark workflows are not working. Quote from here):

In public repositories this action does not work in pull_request workflows when triggered by forks. Any attempt will be met with the error, Resource not accessible by integration. This is due to token restrictions put in place by GitHub Actions. Private repositories can be configured to enable workflows from forks to run without restriction. See here for further explanation. Alternatively, use the pull_request_target event to comment on pull requests.

We could do the following (quote from here):

Use a repo scoped Personal Access Token (PAT) created on an account that has write access to the repository that pull requests are being created in. This is the standard workaround and recommended by GitHub. However, the PAT cannot be scoped to a specific repository so the token becomes a very sensitive secret. If this is a concern, the PAT can instead be created for a dedicated machine account that has collaborator access to the repository. Also note that because the account that owns the PAT will be the creator of pull requests, that user account will be unable to perform actions such as request changes or approve the pull request.

(I am not saying that we should do this here, just wanted to leave it in a comment so it does not get lost:)

@gkronber
Copy link
Collaborator Author

gkronber commented Aug 22, 2024

Much more critical is that nodes are mutated after they are added added to memo, whereby the hash value is changed. This is a bug because the lookup of a node (g.memo[n]) will fail even if the node is contained in the dictionary. The reason is that the lookup will use the new hash value to find the node while the location of the node in the hashtable is based on the old hash value. This bug is detected by check_memo()

@gkronber so, after this change you made here, how is this situation handled? Now the hash should just be recomputed on-the-fly, if an enode (already in memo) is mutated, isn't its hash table location still going to be the old one?

Thank you for carefully checking the changes.
Sorry, I was confused and therefore my comments are not completely correct.

You are correct, my changes for removing the cached hash value are a separate concern. There seems to be a different issue that triggers the problem, that a node is added to memo and then changed later.
The issue seems to be is this line:

if haskey(g.memo, node)

where an entry in g.memo is only updated (but not inserted if there is no existing one) in line 437. This differs from the corresponding code in egg
https://github.com/egraphs-good/egg/blob/b3a53e9e575dc67ce1e8320e4e488f2873a67482/src/egraph.rs#L1390
where an entry is always created.

I fixed this in 2677542. This change and 4d24031 seem to fix the problem (the unit tests pass with check_memo activated).
Edit: after thinking more about this, it seems what happens is that nodes in g.memo are updated but then re-added with the new hash. This means that when we iterate over all entries in check_memo(), we will always find a matching entry. Even for the outdated entries, where the location in the hash table does not match the hash anymore. The old entries just pollute g.memo.

I think it would be best to create a new branch / PR which contains only the improvements / fixes for the lookup in g.memo. Afterwards we can test the effect of caching of hash values separatly.

@gkronber
Copy link
Collaborator Author

I think it would be best to create a new branch / PR which contains only the improvements / fixes for the lookup in g.memo. Afterwards we can test the effect of caching of hash values separatly.

I started to work on this in https://github.com/gkronber/Metatheory.jl/tree/fix_enode_memo_2

@0x0f0f0f
Copy link
Member

I think it would be best to create a new branch / PR which contains only the improvements / fixes for the lookup in g.memo. Afterwards we can test the effect of caching of hash values separatly.

I started to work on this in https://github.com/gkronber/Metatheory.jl/tree/fix_enode_memo_2

Nice thanks! Waiting for the PR, and then I'll run the benchmarks

@gkronber
Copy link
Collaborator Author

Closing this PR because a part of it was re-done in #239.
The caching of VecExpr hash values can be considered in a different PR.

@gkronber gkronber closed this Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants