CMB quadratic estimator related calculation. Mostly it concerns normalization of quadratic estimators, though it’s not limited to that. The core of this code is a symbolic factorization of double l-space summation of quadratic terms in wigner 3j symbols. Such terms are most often seen in the calculation of lensing normalization.
In Julia REPL, first type ]
to enter package management mode. First install
its dependent library
add https://github.com/guanyilun/Wignerd.jl
then install Symlen.jl with
add https://github.com/guanyilun/laughing-succotash
Note that the installation is only tested on Julia 1.7, so I recommend trying this version first in case you encounter any installation problem. If problem still resides please let me know.
The workflow for building a calculator is the following. Consider the estimator
for source in CMB. One first need to provide an analytic expression, of the
quantity that we want to calculate as a sum of ℓ₁ and ℓ₂. For example, in
the below example, we are interested in R
.
using Symlens
# provide an analytic expression: here R is what we are interested to
# sum over ℓ₁ and ℓ₂
@syms ℓ ℓ₂ ℓ₁ y(ℓ) f(ℓ)
W = ((2ℓ+1)*(2ℓ₁+1)*(2ℓ₂+1)/4π)^(1/2)*w3j(ℓ₁,ℓ₂,ℓ,0,0,0)*y(ℓ₁)*y(ℓ₂)
R = (W^2*f(ℓ₁)*f(ℓ₂))/(2*(2*ℓ+1))
Then one can easily build a calculator for it with
build_l12sum_calculator
function. To use this function, we
first provide our target expression which is R
, and then we provide
a function name, which we can use to call the built function, here we
named it source_ext
. The third parameter is a dictionary that tells
our code how we can map our symbolic quantities to actual variables in
the function, and subsequently, in the fourth parameter, tells the
code how these parameters are to be passed in, i.e., in what
order. The last parameter tells the code whether one should return the
function as an expression, which can then be saved to a text file and
loaded later, or to turn it into a function reachable in our scope.
@syms fl yl
build_l12sum_calculator(
R, "source_est",
Dict(f(ℓ)=>fl, y(ℓ)=>yl),
[fl, yl],
evaluate=false
) |> println
The output of this code will look like
function source_est(lmax, fl, yl)
npoints = ((max(lmax, length.([fl, yl])...) * 3 + 1) / 2 |> round) |> Int
glq = glquad(npoints)
ℓ = collect(0:max(length.([fl, yl])...) - 1)
zeta_1 = @__dot__(fl*(yl^2))
zeta_1 = cf_from_cl(glq, 0, 0, zeta_1; prefactor = true)
res = cl_from_cf(glq, 0, 0, lmax, @__dot__(6.283185307179585(zeta_1^2))) |> (x->begin
x .*= @__dot__(0.5)
x
end)
res
end
which is an expression of the function. One can see that it places our
input variables in the defined order. The additional parameter lmax
is added by default which tells the code what is the maximum ℓ to
calculate. The cf_from_cl
calls are converting our ℓ space
quantities to angular space, and cl_from_cf
calls are the
opposite. This shows how our code factors ℓ space convolution into a
real space multiplication to speed-up the calculation. The result is
returned in the res
variable by default. Knowing this is important
because sometimes one may need to do something to the result before
returning. For example, we may want to return 1 / res
instead of
res
. To do that, there is an option in build_l12sum_calculator
called post
which allows one to pass a Vector of arbitrary
expression. We can then write the following
@syms fl yl
build_l12sum_calculator(
R, "source_est",
Dict(f(ℓ)=>fl, y(ℓ)=>yl),
[fl, yl],
post=[:(res .^= -1)],
evaluate=false,
) |> println
Output:
function source_est(lmax, fl, yl)
npoints = ((max(lmax, length.([fl, yl])...) * 3 + 1) / 2 |> round) |> Int
glq = glquad(npoints)
ℓ = collect(0:max(length.([fl, yl])...) - 1)
zeta_1 = @__dot__(fl*(yl^2))
zeta_1 = cf_from_cl(glq, 0, 0, zeta_1; prefactor = true)
res = cl_from_cf(glq, 0, 0, lmax, @__dot__(6.283185307179585(zeta_1^2))) |> (x->begin
x .*= @__dot__(0.5)
x
end)
res .^= -1
res
end
As you can see, the line we specified in post
is inserted right
before returning. Similarly there is also a pre
option to add
statements before computing the zeta
variables, which work in a
similar way so we will not demonstrate again.
Next we can actually evaluate the function,
@syms fl yl
build_l12sum_calculator(
R, "source_est",
Dict(f(ℓ)=>fl, y(ℓ)=>yl),
[fl, yl],
post = [:(res .^= -1)],
evaluate=true
)
Output:
source_est (generic function with 1 method)
This tells us that the calculator is successfully evaluated and
inserted to our scope. We can compare it to a similar calculator
implemented in tempura
which is a hardcoded fortran calculator,
using PyCall, BenchmarkTools, Plots
@pyimport numpy as np
@pyimport pytempura as tp
# load cmb power spectrum
cls = np.loadtxt("data/cosmo2017_10K_acc3_lensedCls.dat")
# make a dummy noise model for testing
lmax = 3000
l = collect(0:lmax)
nltt = @. 10*(1+l/1000)^(3) # dummy
cltt = [0,0,cls[1:3000-1,2]...]
ocltt = nltt + cltt
# tempura call
ucl = Dict("TT" => cltt)
tcl = Dict("TT" => ocltt);
res_py = tp.get_norms(["src"], ucl, tcl, 2, 3000,3000)["src"]
# our dynamically built function
yl = one.(l)
fl = 1 ./ ocltt
res_sym = source_est(3000, fl, yl)
# compare the results
plot(l, [res_py res_sym], labels=["tempura" "Symlens"], xaxis=:log10, xlim=(2,3000), title="source TT")
Output:
This shows that our calculator is in an excellent agreement with
tempura
, without us manually writing fortran code! How is the
performance of our dynamically build calculator compared to Fortran
code?
@btime tp.get_norms(["src"], $ucl, $tcl, 0, 3000,3000)["src"];
Output:
805.793 ms (75 allocations: 26.67 KiB)
@btime source_est(3000, $fl, $yl);
Output:
14.334 ms (9073 allocations: 750.70 KiB)
This shows that our new calculator is ~ 60 times faster than the
previous code. Note that the performance gain is not due to us
building the function dynamically, nor due to performance of julia
versus fortran. It is mostly coming from the wigner d calculator which
I implemented in a separate repo. It is implemented based on an
iteration-free algorithm, thanks to the FastGaussianQuadrature.jl
library, that solves the quadrature weights in O(1) complexity. This
is much faster (~200x) than the Newton’s method approach implemented
in tempura
. The wigner d recursive calculation itself is also about
a factor of 2-4 faster due to SIMD optimization thanks to the
LoopVectorization.jl
package.