Fix enode memoization #238

gkronber · 2024-08-18T12:36:13Z

Metatheory uses caching of hash values for enodes (in VecExpr) presumably to reduce the number of hash value calculations. At the same time we find code such as haskey(g.memo, n) ? g.memo[n] : 0 which does two lookups in the memo dictionary where only one is necessary.

Much more critical is that nodes are mutated after they are added added to memo, whereby the hash value is changed. This is a bug because the lookup of a node (g.memo[n]) will fail even if the node is contained in the dictionary. The reason is that the lookup will use the new hash value to find the node while the location of the node in the hashtable is based on the old hash value. This bug is detected by check_memo().

This bug has been introduced by myself in #229 , but even before that the memoization was incorrect, because of the issue of hash collisions, and hash-values of updated nodes not begin updated in memo.

I believe it will be difficult, if not impossible, to combine caching of hash-values with mutable nodes in the memo dictionary. But I'm open for suggestions.

…mporarily. This breaks unit tests as it uncovers bugs in node memoization.

… haskey() and getindex()).

…d unit tests pass.

gkronber · 2024-08-18T12:37:43Z

New benchmark results:

julia> run(SUITE)
5-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "prop_logic" => 4-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraph", "logic"]
          "freges_theorem" => Trial(1.023 ms)
          "prove1" => Trial(33.271 ms)
          "demorgan" => Trial(90.200 μs)
          "rewrite" => Trial(111.400 μs)
  "basic_maths" => 2-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraphs"]
          "simpl2" => Trial(17.517 ms)
          "simpl1" => Trial(4.194 ms)
  "while_superinterpreter" => 1-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraph"]
          "while_10" => Trial(17.129 ms)
  "egraph" => 2-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraphs"]
          "addexpr" => Trial(4.934 ms)
          "constructor" => Trial(500.000 ns)
  "calc_logic" => 2-element BenchmarkTools.BenchmarkGroup:
          tags: ["egraph", "logic"]
          "freges_theorem" => Trial(32.804 ms)
          "demorgan" => Trial(75.000 μs)

Compare to #225 (comment) (on the same machine). Note that runtime has been improved.

Let me know if there is a more comprehensive benchmark set that I should try to see whether caching of node hash values is beneficial.

gkronber · 2024-08-18T12:58:22Z

While the unit tests pass (with check_memo active) indicating that the bug is fixed, I still need to double check that enodes are not updated after they have been added to memo.

0x0f0f0f · 2024-08-21T09:02:48Z

Can you merge master to update the github actions conditions? It should be running the benchmarks against master and post them here

0x0f0f0f · 2024-08-21T09:05:15Z

src/EGraphs/egraph.jl

-  v_unset_hash!(n)
-  @label ret
-  v_hash!(n)


Isn't the hash going to mutate as well? What is the difference from caching it?

Yes, you are right, the main issue is that VecExpr must not be updated after they have been added to memo. Caching of hash values is an independent concern.
I did a quick analysis, in which I checked how often the cached values are actually used and I saw only a small usage factor. I'll redo the analysis more carefully and post the result here.

The cached value can make up 15% to 20% of the memory required for VecExpr.

Using the annotations in https://github.com/gkronber/Metatheory.jl/tree/count_vecexpr_hash_calls
(gkronber@593d9e2)

julia> using Metatheory Precompiling Metatheory 1 dependency successfully precompiled in 2 seconds. 6 already precompiled. julia> include("benchmark/benchmarks.jl") Benchmark(evals=1, seconds=5.0, samples=10000) julia> run(SUITE) [...] julia> Metatheory.VecExprModule.vexpr_created 12355173 julia> Metatheory.VecExprModule.v_copy_calls 5247759 julia> Metatheory.VecExprModule.v_new_calls 3975935 julia> Metatheory.VecExprModule.unset_hash_calls 41556639 julia> Metatheory.VecExprModule.hash_calls 93167599 julia> Metatheory.VecExprModule.cached_hash_computation 40712170 julia> Metatheory.VecExprModule.cached_hash_access 1205819 julia> Metatheory.EGraphs.memo_lookups 31998547 julia> Metatheory.EGraphs.memo_add 15708295

In this run of the benchmarks:

12mio VecExpr objects are constructed (some via v_new, some via copy, most via direct constructor calls)

32mio lookups are done in memo (note that most of them call the hash function twice via haskey(n) && g.memo[n]).

15mio times g.memo[n] is set (memo_add)

memo lookups and adding to memo cause 93mio calls to hash(n::VecExpr)

v_hash! is called 42mio times (40.7mio times the hash value is calculated, 1.2mio times the cached value is returned)

-> only a bit more than half of the hash calls can use the cached value

if instead of haskey(n) && g.memo[n] we use get we should be able to reduce the number of hash calls significantly (~~exact number to be added in a later edit~~).

improving memo lookups we have only 64mio calls to hash(n::VecExpr). The other numbers are unchanged. We still need to calculate the hash 40.7mio times. (65346c7)

VecExpr must not be updated after they have been added to memo.

I think this happens as well in egg and is part of the algorithm. We should ask the egg community how they are doing it.

In egg they are careful to clone nodes before adding them to memo.

0x0f0f0f · 2024-08-22T10:09:30Z

Can you merge master to update the github actions conditions? It should be running the benchmarks against master and post them here

again, i should have fixed at least CI

0x0f0f0f · 2024-08-22T10:12:43Z

Much more critical is that nodes are mutated after they are added added to memo, whereby the hash value is changed. This is a bug because the lookup of a node (g.memo[n]) will fail even if the node is contained in the dictionary. The reason is that the lookup will use the new hash value to find the node while the location of the node in the hashtable is based on the old hash value. This bug is detected by check_memo()

@gkronber so, after this change you made here, how is this situation handled? Now the hash should just be recomputed on-the-fly, if an enode (already in memo) is mutated, isn't its hash table location still going to be the old one?

codecov-commenter · 2024-08-22T10:16:37Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 86.38373% with 154 lines in your changes missing coverage. Please review.

Please upload report for BASE (ale/3.0@c43b0fb). Learn more about missing BASE report.

Files	Patch %	Lines
src/EGraphs/egraph.jl	82.14%	30 Missing ⚠️
src/EGraphs/Schedulers.jl	43.47%	26 Missing ⚠️
src/utils.jl	7.69%	24 Missing ⚠️
src/Syntax.jl	88.66%	17 Missing ⚠️
src/Patterns.jl	78.72%	10 Missing ⚠️
src/EGraphs/saturation.jl	95.17%	7 Missing ⚠️
src/Rules.jl	83.33%	7 Missing ⚠️
ext/Plotting.jl	0.00%	6 Missing ⚠️
src/Rewriters.jl	53.84%	6 Missing ⚠️
src/ematch_compiler.jl	95.83%	6 Missing ⚠️
... and 6 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             ale/3.0     #238   +/-   ##
==========================================
  Coverage           ?   79.70%           
==========================================
  Files              ?       19           
  Lines              ?     1488           
  Branches           ?        0           
==========================================
  Hits               ?     1186           
  Misses             ?      302           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

0x0f0f0f · 2024-08-22T10:25:50Z

.github/workflows/benchmark_pr.yml

-on:
-  pull_request:
-    branches:
-      - master


this is also changed in master i think

nmheim · 2024-08-22T14:23:09Z

Hey hey I just looked into why the benchmark workflows are not working. Quote from here):

In public repositories this action does not work in pull_request workflows when triggered by forks. Any attempt will be met with the error, Resource not accessible by integration. This is due to token restrictions put in place by GitHub Actions. Private repositories can be configured to enable workflows from forks to run without restriction. See here for further explanation. Alternatively, use the pull_request_target event to comment on pull requests.

We could do the following (quote from here):

Use a repo scoped Personal Access Token (PAT) created on an account that has write access to the repository that pull requests are being created in. This is the standard workaround and recommended by GitHub. However, the PAT cannot be scoped to a specific repository so the token becomes a very sensitive secret. If this is a concern, the PAT can instead be created for a dedicated machine account that has collaborator access to the repository. Also note that because the account that owns the PAT will be the creator of pull requests, that user account will be unable to perform actions such as request changes or approve the pull request.

(I am not saying that we should do this here, just wanted to leave it in a comment so it does not get lost:)

gkronber · 2024-08-22T16:59:52Z

Much more critical is that nodes are mutated after they are added added to memo, whereby the hash value is changed. This is a bug because the lookup of a node (g.memo[n]) will fail even if the node is contained in the dictionary. The reason is that the lookup will use the new hash value to find the node while the location of the node in the hashtable is based on the old hash value. This bug is detected by check_memo()

@gkronber so, after this change you made here, how is this situation handled? Now the hash should just be recomputed on-the-fly, if an enode (already in memo) is mutated, isn't its hash table location still going to be the old one?

Thank you for carefully checking the changes.
Sorry, I was confused and therefore my comments are not completely correct.

You are correct, my changes for removing the cached hash value are a separate concern. There seems to be a different issue that triggers the problem, that a node is added to memo and then changed later.
The issue seems to be is this line:

Metatheory.jl/src/EGraphs/egraph.jl

Line 435 in 7a8cb82

if haskey(g.memo, node)

where an entry in g.memo is only updated (but not inserted if there is no existing one) in line 437. This differs from the corresponding code in egg
https://github.com/egraphs-good/egg/blob/b3a53e9e575dc67ce1e8320e4e488f2873a67482/src/egraph.rs#L1390
where an entry is always created.

I fixed this in 2677542. This change and 4d24031 seem to fix the problem (the unit tests pass with check_memo activated).
Edit: after thinking more about this, it seems what happens is that nodes in g.memo are updated but then re-added with the new hash. This means that when we iterate over all entries in check_memo(), we will always find a matching entry. Even for the outdated entries, where the location in the hash table does not match the hash anymore. The old entries just pollute g.memo.

I think it would be best to create a new branch / PR which contains only the improvements / fixes for the lookup in g.memo. Afterwards we can test the effect of caching of hash values separatly.

gkronber · 2024-08-23T09:44:52Z

I think it would be best to create a new branch / PR which contains only the improvements / fixes for the lookup in g.memo. Afterwards we can test the effect of caching of hash values separatly.

I started to work on this in https://github.com/gkronber/Metatheory.jl/tree/fix_enode_memo_2

0x0f0f0f · 2024-08-23T10:10:50Z

I think it would be best to create a new branch / PR which contains only the improvements / fixes for the lookup in g.memo. Afterwards we can test the effect of caching of hash values separatly.

I started to work on this in https://github.com/gkronber/Metatheory.jl/tree/fix_enode_memo_2

Nice thanks! Waiting for the PR, and then I'll run the benchmarks

gkronber · 2024-08-28T14:36:14Z

Closing this PR because a part of it was re-done in #239.
The caching of VecExpr hash values can be considered in a different PR.

gkronber added 5 commits August 18, 2024 13:32

Fix check_memo() to match egg, and activate check_memo() assertion te…

4d24031

…mporarily. This breaks unit tests as it uncovers bugs in node memoization.

Remove unused code.

59718eb

Remove caching of hash values in VecExpr.

e4a3f03

Improve lookups in g.memo to search only once (call get(), instead of…

2677542

… haskey() and getindex()).

Test check_memo() again for running tests. Now the assertions hold an…

c5c4776

…d unit tests pass.

Simplify code.

de83bd9

gkronber mentioned this pull request Aug 20, 2024

Fix lambda theory test #236

Merged

trigger on all branches

9f504fe

0x0f0f0f reviewed Aug 21, 2024

View reviewed changes

gkronber and others added 3 commits August 21, 2024 16:16

Merge remote-tracking branch 'origin/master' into fix_enode_memoization

a606a15

always run CI

29890ce

add column

8e95e4a

Merge remote-tracking branch 'origin/master' into fix_enode_memoization

c398f31

0x0f0f0f reviewed Aug 22, 2024

View reviewed changes

gkronber-machine mentioned this pull request Aug 24, 2024

Fix hashing and memoization of enodes (VecExpr) #239

Merged

gkronber closed this Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix enode memoization #238

Fix enode memoization #238

gkronber commented Aug 18, 2024

gkronber commented Aug 18, 2024 •

edited

Loading

gkronber commented Aug 18, 2024 •

edited

Loading

0x0f0f0f commented Aug 21, 2024

0x0f0f0f Aug 21, 2024

gkronber Aug 21, 2024

gkronber Aug 21, 2024 •

edited

Loading

0x0f0f0f Aug 22, 2024

gkronber Aug 22, 2024

0x0f0f0f commented Aug 22, 2024

0x0f0f0f commented Aug 22, 2024

codecov-commenter commented Aug 22, 2024 •

edited

Loading

0x0f0f0f Aug 22, 2024

nmheim commented Aug 22, 2024 •

edited

Loading

gkronber commented Aug 22, 2024 •

edited

Loading

gkronber commented Aug 23, 2024

0x0f0f0f commented Aug 23, 2024

gkronber commented Aug 28, 2024

Fix enode memoization #238

Fix enode memoization #238

Conversation

gkronber commented Aug 18, 2024

gkronber commented Aug 18, 2024 • edited Loading

gkronber commented Aug 18, 2024 • edited Loading

0x0f0f0f commented Aug 21, 2024

0x0f0f0f Aug 21, 2024

Choose a reason for hiding this comment

gkronber Aug 21, 2024

Choose a reason for hiding this comment

gkronber Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

0x0f0f0f Aug 22, 2024

Choose a reason for hiding this comment

gkronber Aug 22, 2024

Choose a reason for hiding this comment

0x0f0f0f commented Aug 22, 2024

0x0f0f0f commented Aug 22, 2024

codecov-commenter commented Aug 22, 2024 • edited Loading

Codecov Report

0x0f0f0f Aug 22, 2024

Choose a reason for hiding this comment

nmheim commented Aug 22, 2024 • edited Loading

gkronber commented Aug 22, 2024 • edited Loading

gkronber commented Aug 23, 2024

0x0f0f0f commented Aug 23, 2024

gkronber commented Aug 28, 2024

gkronber commented Aug 18, 2024 •

edited

Loading

gkronber commented Aug 18, 2024 •

edited

Loading

gkronber Aug 21, 2024 •

edited

Loading

codecov-commenter commented Aug 22, 2024 •

edited

Loading

nmheim commented Aug 22, 2024 •

edited

Loading

gkronber commented Aug 22, 2024 •

edited

Loading