Skip to content

Commit

Permalink
more docstrings
Browse files Browse the repository at this point in the history
  • Loading branch information
olynch committed Aug 16, 2024
1 parent 9c90c94 commit 107da67
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 17 deletions.
63 changes: 47 additions & 16 deletions src/EGraphs/egraph.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Functional implementation of https://egraphs-good.github.io/
# https://dl.acm.org/doi/10.1145/3434304

##############################################
# Interface to implement for custom analyses #
##############################################
# ==============================================================
# Interface to implement for custom analyses
# ==============================================================

"""
modify!(eclass::EClass{Analysis})
Expand Down Expand Up @@ -31,9 +31,9 @@ Given an e-node `n`, `make` should return the corresponding analysis value.
"""
function make end

############
# EClasses #
############
# ==============================================================
# EClasses
# ==============================================================

"""
EClass{D}
Expand Down Expand Up @@ -132,13 +132,17 @@ not necessarily very informative, but you can access the terms of each e-node
via `Metatheory.to_expr`.
See the [egg paper](https://dl.acm.org/doi/pdf/10.1145/3434304)
for implementation details.
for implementation details. Of special notice is the e-graph invariants,
and when they do or do not hold. One of the main innovations of `egg` was to
"batch" the maintenance of the e-graph invariants. We use the `clean` field
on this struct to keep track of whether there is pending work to do in order
to re-establish the e-graph invariants.
"""
mutable struct EGraph{ExpressionType,Analysis}
"""
stores the equality relations over e-class ids
The `(potentially non-root id) --> (root id)` mapping.
More specifically, the `(potentially non-root id) --> (root id)` mapping.
"""
uf::UnionFind

Expand Down Expand Up @@ -170,12 +174,27 @@ mutable struct EGraph{ExpressionType,Analysis}
pending::Vector{Pair{VecExpr,Id}}

"""
When an e-node is added to an e-graph for the first time, we add analysis data to the
newly-created e-class by calling [`make`]() on the head of the e-node and the analysis
data for the arguments to that e-node. However, the analysis data for the arguments to
that e-node could get updated at some point, as e-classes are merged.
This is a queue for e-nodes which have had the analysis of some of their arguments
updated, but have not updated the analysis of their parent e-class yet.
"""
analysis_pending::UniqueQueue{Pair{VecExpr,Id}}

"""
The Id of the e-class that we have built this e-graph to simplify.
"""
root::Id

"a cache mapping signatures (function symbols and their arity) to e-classes that contain e-nodes with that function symbol."
classes_by_op::Dict{IdKey,Vector{Id}}

"do we need to do extra work in order to re-establish the e-graph invariants"
clean::Bool

"If we use global buffers we may need to lock. Defaults to false."
needslock::Bool
lock::ReentrantLock
Expand Down Expand Up @@ -220,6 +239,8 @@ EGraph(e; kwargs...) = EGraph{typeof(e),Nothing}(e; kwargs...)
@inline get_constant(@nospecialize(g::EGraph), hash::UInt64) = g.constants[hash]
@inline has_constant(@nospecialize(g::EGraph), hash::UInt64)::Bool = haskey(g.constants, hash)

# Why does one of these use `get!` and the other use `setindex!`?

@inline function add_constant!(@nospecialize(g::EGraph), @nospecialize(c))::Id
h = hash(c)
get!(g.constants, h, c)
Expand Down Expand Up @@ -286,13 +307,17 @@ Returns the canonical e-class id for a given e-class.
# new_n
# end

"""
Make sure all of the arguments of `n` point to root nodes in the unionfind
data structure for `g`.
"""
function canonicalize!(g::EGraph, n::VecExpr)
v_isexpr(n) || @goto ret
for i in (VECEXPR_META_LENGTH + 1):length(n)
@inbounds n[i] = find(g, n[i])
if v_isexpr(n)
for i in (VECEXPR_META_LENGTH + 1):length(n)
@inbounds n[i] = find(g, n[i])
end
v_unset_hash!(n)
end
v_unset_hash!(n)
@label ret
v_hash!(n)
n
end
Expand Down Expand Up @@ -391,8 +416,9 @@ function addexpr!(g::EGraph, se)::Id
end

"""
Given an [`EGraph`](@ref) and two e-class ids, set
the two e-classes as equal.
Given an [`EGraph`](@ref) and two e-class ids, merge the two corresponding e-classes.
This includes merging the analysis data of the e-classes.
"""
function Base.union!(
g::EGraph{ExpressionType,AnalysisType},
Expand Down Expand Up @@ -435,6 +461,9 @@ function Base.union!(
return true
end

"""
Returns whether all of `ids...` are the same e-class in `g`.
"""
function in_same_class(g::EGraph, ids::Id...)::Bool
nids = length(ids)
nids == 1 && return true
Expand Down Expand Up @@ -563,7 +592,9 @@ end

# Thanks to Max Willsey and Yihong Zhang


"""
Look up a grounded pattern.
"""
function lookup_pat(g::EGraph{ExpressionType}, p::PatExpr)::Id where {ExpressionType}
@assert isground(p)

Expand Down
19 changes: 18 additions & 1 deletion src/vecexpr.jl
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,29 @@ const Id = UInt64
end
An e-node is represented by `Vector{Id}` where:
* Position 1 stores the hash of the `VecExpr`.
* Position 1 stores the hash of the rest of the `VecExpr`.
* Position 2 stores the bit flags (`isexpr` or `iscall`).
* Position 3 stores the signature
* Position 4 stores the hash of the `head` (if `isexpr`) or node value in the e-graph constants.
* The rest of the positions store the e-class ids of the children nodes.
The meaning of the bitflags `isexpr` and `iscall` can be best understood through looking at
the source for `to_expr(g::EGraph, n::VecExpr)` in `src/EGraphs/egraph.jl`. Namely,
e-nodes for which `isexpr` is false have no arguments; their only "data" is their head.
E-nodes for which `isexpr` is true and `iscall` is also true correspond to
`Expr(:call, head, args...)` expressions, and e-nodes for which `isexpr` is true but
`iscall` is false correspond to `Expr(head, args...)` expressions. There should
not be `VecExpr`s with `isexpr = false` but `iscall = true`.
The "signature" of an expression seems to in practice be computed as the hash of the head combined
with the number of arguments (the arity). See: [`addexpr!`]() in `src/EGraphs/egraph.jl`.
Perhaps in the future, signatures could also involve type information, e.g. to disambiguate
overloaded heads? Signatures are used in the `classes_by_op` dictionary in a e-graph,
so that when you are matching for `(a + b)` you can iterate over all of the e-classes
that have some e-node with `(+, 2)` as its signature.
It also seems like the signature of a constant is `0`.
The expression is represented as an array of integers to improve performance.
The hash value for the VecExpr is cached in the first position for faster lookup performance in dictionaries.
"""
Expand Down

0 comments on commit 107da67

Please sign in to comment.