Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions constructed using SymbolicUtils.jl exhibit lower computational performance when calculating gradients. #684

Open
chooron opened this issue Dec 23, 2024 · 0 comments

Comments

@chooron
Copy link

chooron commented Dec 23, 2024

Hello, I would like to inquire about how to resolve the relatively low computational performance exhibited when calculating gradients with a function built using SymbolicUtils.jl. Below is a sample code for reference:

using SymbolicUtils
using SymbolicUtils.Code
using Symbolics
using BenchmarkTools
using Zygote

@variables a b c d
@variables p1 p2 p3

assign_list = [
    Assignment(c, a * p1 + b * p2),
    Assignment(d, c * p1 + b * p3),
]

flux_output_array = MakeArray([c, d], Vector)

func1 = Func([DestructuredArgs([a,b]), DestructuredArgs([p1, p2, p3])], [], Let(assign_list, flux_output_array, false))
test_func1 = eval(toexpr(func1))
test_func2(i,p) = begin
    a = i[1]
    b = i[2]
    p1 = p[1]
    p2 = p[2]
    p3 = p[3]
    c = a * p1 + b * p2
    d = c * p1 + b * p3
    [c, d]
end

For a single data computation, the performance of the function evaluation and the backward gradient computation is as follows:

@btime test_func1([2,3],[2,3,4])
@btime test_func2([2,3],[2,3,4])
# 42.525 ns (6 allocations: 240 bytes)
# 26.004 ns (4 allocations: 176 bytes)
@btime gradient((p)->sum(test_func1([2,3], p)), [2,3,4])
@btime gradient((p)->sum(test_func2([2,3], p)), [2,3,4])
# 4.343 μs (93 allocations: 5.38 KiB)
# 74.486 ns (11 allocations: 416 bytes)

The results show that test_func1, generated using SymbolicUtils.jl, is less efficient compared to test_func2, which is manually constructed. Particularly, in gradient computation with Zygote.jl, the gradient computation for test_func1 is less efficient.

In experiments with large datasets, I used a matrix for broadcasting computations and then calculated its gradient:

input = ones(2,10000)
params = [2,3,4]
@btime test_func1.(eachslice(input,dims=2), Ref(params));
@btime test_func2.(eachslice(input,dims=2), Ref(params));
# 152.900 μs (20008 allocations: 859.56 KiB)
# 162.300 μs (20021 allocations: 860.09 KiB)

@btime gradient((p)->sum(sum(test_func1.(eachslice(input,dims=2), Ref(p)))), [2,3,4])
@btime gradient((p)->sum(sum(test_func2.(eachslice(input,dims=2), Ref(p)))), [2,3,4])
# 29.549 ms (870129 allocations: 26.86 MiB)
# 1.903 ms (100106 allocations: 7.33 MiB)

The results indicate that in large-scale data experiments, the gradient computation cost of test_func1 is significantly higher than that of test_func2.

Therefore, I would like to ask whether there is room for improvement in efficiency when using SymbolicUtils.jl to construct functions, and why functions constructed with SymbolicUtils.jl exhibit lower computational efficiency, especially in gradient computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant