Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of local buffers #170

Merged
merged 6 commits into from
Oct 28, 2023
Merged

Conversation

rayegun
Copy link
Contributor

@rayegun rayegun commented Oct 21, 2023

This is a basic attempt to address #156. It changes nothing about the default behavior, but allows (generated function) local buffers to be supplied to saturate!. To save space it may be possible (if frowned upon) to use thread local or scoped buffers within a generated function, but I did not test this (future compiler threading could make that impossible).

Basic benchmarking indicates no performance impact, but an expert should weigh in here.

I didn't use ConcurrentCollections because spawning tasks inside of a generated function is questionable anyway, so threading should be disabled there.

src/utils.jl Outdated
Comment on lines 3 to 21
function lockbuffer!(f, params)
if params.threaded
lock(params.buffer_lock) do
return f()
end
else
return f()
end
end
function lockmergesbuffer!(f, params)
if params.threaded
lock(params.merges_buffer_lock) do
return f()
end
else
return f()
end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lock the buffer only if threaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to lock if not threaded? The docs are fairly clear that you shouldn't take locks in generated functions (although this is likely to be problematic mostly for globals I was told it's probably best to avoid them even when passed in).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched the name of this parameter in the EGraph to needslock. We don't need to lock within a generated function, so I'd like to keep that functionality. But that separates it out from threading.

It's also not clear to me what params.threaded does? It doesn't seem to be used anywhere.

src/EGraphs/saturation.jl Outdated Show resolved Hide resolved
@0x0f0f0f
Copy link
Member

Thanks a lot :)

@willow-ahrens
Copy link
Contributor

Thanks Raye!

@rayegun
Copy link
Contributor Author

rayegun commented Oct 24, 2023

@0x0f0f0f I am a little confused about the actual uses of the locks. Does anything internal actually access the global buffers simultaneously? Or is it just to support user threading? The only threading appears to be in Rewriters.

If it's just users then probably threaded is not the right field so much as needslock?

@0x0f0f0f
Copy link
Member

@0x0f0f0f I am a little confused about the actual uses of the locks. Does anything internal actually access the global buffers simultaneously? Or is it just to support user threading? The only threading appears to be in Rewriters.

If it's just users then probably threaded is not the right field so much as needslock?

e-matching search can be done in parallel since it doesn't mutate the e-graph. it was parallelized in previous versions and can be reintroduced.

  • equality saturation search step produces Bindings in the BUFFER. (can be parallelized, should on very large e-graphs)
  • eqsat application consumes the bindings from the buffer, creates new terms by applying the rewrites, inserts them in the e-graph, then produces pairs of e-class ids that should be merged in the MERGE_BUF.
  • eqsat application then consumes all of MERGE_BUF and merges the e-classes
  • eqsat rebuild does not touch buffers

@rayegun
Copy link
Contributor Author

rayegun commented Oct 24, 2023

Okay so can we keep graph.needslock as user defined (but defaulting to true) until e-graph threading is re-introduced?

Once it is we can be sure threading is disabled if needslock in the graph is false? This would move determination of threading from the SaturationParams to the graph. But I believe that would be necessary regardless if we keep params out of the ematcher functions.

We can't thread within generated functions anyway, so it would be disabled then and we would want to avoid locking.

Perhaps canthread is a better name? And then params.threaded && graph.canthread determines if threading is done.

@willow-ahrens
Copy link
Contributor

Rereading, I wonder if I've misunderstood the purpose of the buffer locks. It seems that they do more than just lock buffers for push or pop, there are several statements and even entire functions inside the locked regions. Do the locks need to guard so much code, or is it possible to move the locks to surround just the buffers? If the locks need to guard such large regions of code, perhaps concurrent collections will not become useful.

@willow-ahrens
Copy link
Contributor

willow-ahrens commented Oct 25, 2023

I was originally suggesting concurrent collections as it would allow an interface where

push!(buffer::ConcurrentQueue, x)

would be roughly equivalent to something like

lock(buffer.lock) do
    push!(buffer.queue, x)
end

So that the queues could be responsible for their own locks. Then, if you wanted to thread the code you would use a thread-safe buffer. If you wanted to run serial you would use a regular buffer.

then you could select buffer=ConcurrentQueue() or buffer=Vector(), etc.

@willow-ahrens
Copy link
Contributor

so long as each parallel phase of the algorithm either only pushes or only pops, I think we can move the locks to surround just the buffers and perhaps even make the code more parallel. Is there a phase of the algorithm where one thread might push to a buffer while another thread pops or reads from that same buffer?

@willow-ahrens
Copy link
Contributor

Ah, I see. I think that merge! is not thread-safe.

function Base.merge!(g::EGraph, a::EClassId, b::EClassId)::EClassId
and so we need to run the merging in serial here:
lock(MERGES_BUF_LOCK) do
while !isempty(MERGES_BUF[])
(l, r) = popfirst!(MERGES_BUF[])
merge!(g, l, r)
end
end

In this instance, MERGES_BUF_LOCK is functioning more as a lock on the egraph.

@willow-ahrens
Copy link
Contributor

willow-ahrens commented Oct 25, 2023

@0x0f0f0f Why does every egraph need to share the same buffer? This adds complexity, and I'm not sure what the benefits are:

Pros:

  • Saving memory allocation costs (unlikely to be the dominant cost of the algorithm).

Cons:

  • Multiple threads by default cannot call saturate! at the same time
  • The locks add complexity to the code
  • It's harder to parallelize saturation because it's unclear that we have exclusive access to our buffers

Could we just make all buffers local? This wouldn't preclude egraph threading later, and you could still reuse the buffers by passing the same buffers into multiple egraphs.

@0x0f0f0f
Copy link
Member

Okay so can we keep graph.needslock as user defined (but defaulting to true) until e-graph threading is re-introduced?

Yes, would default to false because needslock is true only if we push to the buffer concurrently.

Once it is we can be sure threading is disabled if needslock in the graph is false? This would move determination of threading from the SaturationParams to the graph. But I believe that would be necessary regardless if we keep params out of the ematcher functions.

SaturationParams.threaded == true should set egraph.needslock to true

We can't thread within generated functions anyway, so it would be disabled then and we would want to avoid locking.

That's alright.

Perhaps canthread is a better name? And then params.threaded && graph.canthread determines if threading is done.

Exactly, but params.threaded should set egraph.needslock

@0x0f0f0f
Copy link
Member

@0x0f0f0f Why does every egraph need to share the same buffer? This adds complexity, and I'm not sure what the benefits are:

Pros:

* Saving memory allocation costs (unlikely to be the dominant cost of the algorithm).

Cons:

* Multiple threads by default cannot call saturate! at the same time

* The locks add complexity to the code

* It's harder to parallelize saturation because it's unclear that we have exclusive access to our buffers

Could we just make all buffers local? This wouldn't preclude egraph threading later, and you could still reuse the buffers by passing the same buffers into multiple egraphs.

@rayegun I'll make a new branch and try just using vectors for buffers. Let's compare the performance.

@0x0f0f0f
Copy link
Member

Performance example with vector buffers:

julia> ex = rewrite(:(((p ⟹ q) && (r ⟹ s) && (p || r)) ⟹ (q || s)), impl)
:(!((!p || q) && ((!r || s) && (p || r))) || (q || s))

julia> g = EGraph(ex);

julia> params = SaturationParams(
         timeout = 10,
         eclasslimit = 5000,
         # scheduler=Schedulers.ScoredScheduler, schedulerparams=(1000,5, Schedulers.exprsize))
         scheduler = Schedulers.BackoffScheduler,
         schedulerparams = (6000, 5),
       )
SaturationParams(10, 0x0000000000000000, 5000, 15000, nothing, Metatheory.EGraphs.var"#30#32"(), Metatheory.EGraphs.Schedulers.BackoffScheduler, (6000, 5), false, true)

julia> saturate!(g, t, SaturationParams(timeout=14))
SaturationReport
=================
        Stop Reason: eclasslimit
        Iterations: 10
        EGraph Size: 5817 eclasses, 70640 nodes
 ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:      1.23s / 100.0%            441MiB / 100.0%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────
 Rebuild       10    970ms   79.2%  97.0ms    302MiB   68.6%  30.2MiB
 Apply         10    181ms   14.7%  18.1ms   78.5MiB   17.8%  7.85MiB
 Search        10   74.5ms    6.1%  7.45ms   59.8MiB   13.6%  5.98MiB
   6           10   21.0ms    1.7%  2.10ms   3.17MiB    0.7%   325KiB
   14          10   10.4ms    0.8%  1.04ms   13.8MiB    3.1%  1.38MiB
   13          10   8.25ms    0.7%   825μs   10.3MiB    2.3%  1.03MiB
   12          10   3.51ms    0.3%   351μs   3.47MiB    0.8%   356KiB
   18          10   3.13ms    0.3%   313μs   4.04MiB    0.9%   414KiB
   11          10   2.74ms    0.2%   274μs   2.73MiB    0.6%   279KiB
   20          10   2.56ms    0.2%   256μs   2.29MiB    0.5%   235KiB
   1           10   2.40ms    0.2%   240μs   2.65MiB    0.6%   271KiB
   17          10   2.02ms    0.2%   202μs   2.84MiB    0.6%   291KiB
   16          10   1.98ms    0.2%   198μs   2.45MiB    0.6%   251KiB
   7           10   1.68ms    0.1%   168μs   1.32MiB    0.3%   135KiB
   21          10   1.54ms    0.1%   154μs    881KiB    0.2%  88.1KiB
   10          10   1.28ms    0.1%   128μs    851KiB    0.2%  85.1KiB
   15          10   1.14ms    0.1%   114μs   1.08MiB    0.2%   110KiB
   19          10   1.02ms    0.1%   102μs    821KiB    0.2%  82.1KiB
   27          10   1.02ms    0.1%   102μs    539KiB    0.1%  53.9KiB
   2           10    879μs    0.1%  87.9μs   0.96MiB    0.2%  98.1KiB
   9           10    864μs    0.1%  86.4μs    732KiB    0.2%  73.2KiB
   8           10    708μs    0.1%  70.8μs    530KiB    0.1%  53.0KiB
   29          10    672μs    0.1%  67.2μs    413KiB    0.1%  41.3KiB
   5           10    619μs    0.1%  61.9μs    593KiB    0.1%  59.3KiB
   30          10    616μs    0.1%  61.6μs    392KiB    0.1%  39.2KiB
   22          10    597μs    0.0%  59.7μs    379KiB    0.1%  37.9KiB
   28          10    595μs    0.0%  59.5μs    379KiB    0.1%  37.9KiB
   26          10    591μs    0.0%  59.1μs    379KiB    0.1%  37.9KiB
   23          10    591μs    0.0%  59.1μs    379KiB    0.1%  37.9KiB
   25          10    590μs    0.0%  59.0μs    379KiB    0.1%  37.9KiB
   24          10    571μs    0.0%  57.1μs    379KiB    0.1%  37.9KiB
   4           10    495μs    0.0%  49.5μs    480KiB    0.1%  48.0KiB
   3           10    373μs    0.0%  37.3μs    288KiB    0.1%  28.8KiB
 ────────────────────────────────────────────────────────────────────

@0x0f0f0f
Copy link
Member

Compared to concurrent deques:

julia> saturate!(g, t, SaturationParams(timeout=14))
SaturationReport
=================
        Stop Reason: eclasslimit
        Iterations: 10
        EGraph Size: 5817 eclasses, 70640 nodes
 ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:      1.19s / 100.0%            438MiB / 100.0%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────
 Rebuild       10    945ms   79.6%  94.5ms    302MiB   69.0%  30.2MiB
 Apply         10    168ms   14.2%  16.8ms   76.9MiB   17.6%  7.69MiB
 Search        10   73.9ms    6.2%  7.39ms   59.0MiB   13.5%  5.90MiB
   14          10   13.2ms    1.1%  1.32ms   13.4MiB    3.0%  1.34MiB
   12          10   10.6ms    0.9%  1.06ms   3.47MiB    0.8%   356KiB
   13          10   9.38ms    0.8%   938μs   10.3MiB    2.4%  1.03MiB
   1           10   3.90ms    0.3%   390μs   2.65MiB    0.6%   271KiB
   6           10   3.86ms    0.3%   386μs   3.15MiB    0.7%   323KiB
   18          10   3.67ms    0.3%   367μs   4.04MiB    0.9%   414KiB
   11          10   2.88ms    0.2%   288μs   2.50MiB    0.6%   256KiB
   17          10   2.50ms    0.2%   250μs   2.84MiB    0.6%   291KiB
   16          10   2.48ms    0.2%   248μs   2.45MiB    0.6%   251KiB
   20          10   2.24ms    0.2%   224μs   2.29MiB    0.5%   235KiB
   2           10   1.76ms    0.1%   176μs    913KiB    0.2%  91.3KiB
   7           10   1.69ms    0.1%   169μs   1.32MiB    0.3%   135KiB
   21          10   1.39ms    0.1%   139μs    881KiB    0.2%  88.1KiB
   15          10   1.36ms    0.1%   136μs   1.08MiB    0.2%   110KiB
   5           10   1.20ms    0.1%   120μs    593KiB    0.1%  59.3KiB
   27          10   1.09ms    0.1%   109μs    539KiB    0.1%  53.9KiB
   10          10   1.06ms    0.1%   106μs    851KiB    0.2%  85.1KiB
   19          10   1.01ms    0.1%   101μs    821KiB    0.2%  82.1KiB
   4           10    931μs    0.1%  93.1μs    480KiB    0.1%  48.0KiB
   9           10    895μs    0.1%  89.5μs    732KiB    0.2%  73.2KiB
   3           10    754μs    0.1%  75.4μs    288KiB    0.1%  28.8KiB
   29          10    752μs    0.1%  75.2μs    413KiB    0.1%  41.3KiB
   8           10    749μs    0.1%  74.9μs    530KiB    0.1%  53.0KiB
   30          10    655μs    0.1%  65.5μs    392KiB    0.1%  39.2KiB
   26          10    653μs    0.1%  65.3μs    379KiB    0.1%  37.9KiB
   25          10    645μs    0.1%  64.5μs    379KiB    0.1%  37.9KiB
   24          10    640μs    0.1%  64.0μs    379KiB    0.1%  37.9KiB
   22          10    634μs    0.1%  63.4μs    379KiB    0.1%  37.9KiB
   23          10    621μs    0.1%  62.1μs    379KiB    0.1%  37.9KiB
   28          10    617μs    0.1%  61.7μs    379KiB    0.1%  37.9KiB
 ────────────────────────────────────────────────────────────────────

@0x0f0f0f 0x0f0f0f mentioned this pull request Oct 28, 2023
@0x0f0f0f 0x0f0f0f merged commit 1453a44 into JuliaSymbolics:master Oct 28, 2023
5 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants