-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type instability (allocations) regression in Julia v1.10-alpha1 #50562
Comments
Is |
No, it is only known at runtime |
Can you please give more details about the code? I can't confirm any allocation difference between the code snippets you shared above: julia> global secondary_indices::Tuple{Symbol,Symbol,Symbol} = (:a, :b, :c)
(:a, :b, :c)
julia> f1() = :i_backward in secondary_indices
f1 (generic function with 1 method)
julia> f2() = any(==(:i_backward), secondary_indices)
f2 (generic function with 1 method)
julia> @benchmark f2()
[ Info: Loading BenchmarkTools...
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 1.708 ns … 11.125 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.833 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.842 ns ± 0.213 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▃
▂▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▄ ▂
1.71 ns Histogram: frequency by time 1.92 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark f1()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 4.084 ns … 14.917 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.250 ns ┊ GC (median): 0.00%
Time (mean ± σ): 4.261 ns ± 0.324 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █ ▆
▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▃ ▂
4.08 ns Histogram: frequency by time 4.33 ns <
Memory estimate: 0 bytes, allocs estimate: 0. |
An MWE seems to be julia> v = Vector{NTuple{2, Symbol}}(undef, 100);
julia> for i in eachindex(v)
values = (:i, :i_backward, :j, :j_backward)
v[i] = (rand(values), rand(values))
end
julia> function f1(v)
res = 0
for secondary_indices in v
if :i_backward in secondary_indices
res += 1
else
res += 2
end
end
return res
end
f1 (generic function with 1 method)
julia> function f2(v)
res = 0
for secondary_indices in v
if any(==(:i_backward), secondary_indices)
res += 1
else
res += 2
end
end
return res
end
f2 (generic function with 1 method)
julia> using BenchmarkTools
julia> f1(v)
162
julia> f2(v)
162
julia> @benchmark f1(v)
BenchmarkTools.Trial: 10000 samples with 152 evaluations.
Range (min … max): 672.553 ns … 6.856 μs ┊ GC (min … max): 0.00% … 83.65%
Time (median): 736.796 ns ┊ GC (median): 0.00%
Time (mean ± σ): 794.691 ns ± 429.613 ns ┊ GC (mean ± σ): 5.35% ± 8.51%
█▇▂▁▂ ▂
████████▅▅▄▁▃▄▆█▃▃▄▁▁▃▁▁▁▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ █
673 ns Histogram: log(frequency) by time 4.64 μs <
Memory estimate: 3.12 KiB, allocs estimate: 100.
julia> @benchmark f2(v)
BenchmarkTools.Trial: 10000 samples with 965 evaluations.
Range (min … max): 81.077 ns … 273.202 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 81.771 ns ┊ GC (median): 0.00%
Time (mean ± σ): 83.106 ns ± 7.061 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█▄ ▁ ▁
███▄▆▇▃▃▁▁▁▁▁▃▁▁▃▁▁▁▁▃▁▁▃▅█▇▆▄▇▇▇▆▆▄▃▁▃▃▃▃▃▁▁▁▁▁▁▁▁▃▄▃▄▇▇▅▇█ █
81.1 ns Histogram: log(frequency) by time 113 ns <
Memory estimate: 0 bytes, allocs estimate: 0. That's with Julia v1.10.0-alpha1. With the current stable release Julia v1.9.2, I get julia> @benchmark f1(v)
BenchmarkTools.Trial: 10000 samples with 961 evaluations.
Range (min … max): 78.634 ns … 204.733 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 86.562 ns ┊ GC (median): 0.00%
Time (mean ± σ): 87.574 ns ± 6.437 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁ ▇█ ▁ ▁
█▇▁▁▁▁▁▃▁▁▁██▇██▅▁▅▃▁▄▃▄▁▁▃▃▁▄▃▅█▇▅▇█▆▅▆▄▃▃▄▃▁▁▃▃▁▁▁▄▆▄▄▇▇▆█ █
78.6 ns Histogram: log(frequency) by time 118 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark f2(v)
BenchmarkTools.Trial: 10000 samples with 961 evaluations.
Range (min … max): 80.238 ns … 267.342 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 86.915 ns ┊ GC (median): 0.00%
Time (mean ± σ): 88.485 ns ± 7.884 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂█ ▂ ▁ ▁
▇▁▃▁▁▁▁██▅█▅▄▁▁▁▃▃▁▁▃▁▁▄█▆▆▇▆▇▅▁▁▄▄▁▁▃▄▁▅▆▄▇▇█▇▆▅▄▃▄▁▁▃▄▃▄▄▄ █
80.2 ns Histogram: log(frequency) by time 129 ns <
Memory estimate: 0 bytes, allocs estimate: 0. |
Yes, thanks. I could reproduce it with this MRE also: julia> f1(rt) = Base.sym_in(:i_backward, rt[]);
julia> f2(rt) = @inline Base.sym_in(:i_backward, rt[]);
julia> rt = Ref((:b_back, :foofakgka, :i_backw));
julia> f1(rt); @allocated f1(rt)
32
julia> f2(rt); @allocated f2(rt)
0 So this is caused by the Lines 609 to 610 in d215d91
As the MRE above shows, we can eliminate the allocation if we remove it. Alternatively, we can achieve the same result if we remove the The |
The problem is that inlining these functions destroys the effect assumptions, so they can't be constproped, even if irinterp later finds the constants. What object is being allocated here? Can we improve the codegen to avoid that happening? |
It appears that the allocation is triggered by the construction of a julia> @code_typed f1(rt)
CodeInfo(
1 ─ %1 = Base.getfield(rt, :x)::Tuple{Symbol, Symbol, Symbol}
│ %2 = invoke Base.sym_in(:i_backward::Symbol, %1::Tuple{Vararg{Symbol}})::Bool
└── return %2
) => Bool
julia> @code_llvm f1(rt)
; @ REPL[1]:1 within `f1`
define i8 @julia_f1_1641({}* noundef nonnull align 8 dereferenceable(24) %"rt::RefValue") #0 {
top:
%gcframe13 = alloca [5 x {}*], align 16
%gcframe13.sub = getelementptr inbounds [5 x {}*], [5 x {}*]* %gcframe13, i64 0, i64 0
%0 = bitcast [5 x {}*]* %gcframe13 to i8*
call void @llvm.memset.p0i8.i64(i8* align 16 %0, i8 0, i64 40, i1 true)
%1 = call {}*** inttoptr (i64 6855994896 to {}*** (i64)*)(i64 261) #9
; ┌ @ refvalue.jl:59 within `getindex`
; │┌ @ Base.jl:37 within `getproperty`
%2 = bitcast [5 x {}*]* %gcframe13 to i64*
store i64 12, i64* %2, align 16
%3 = load {}**, {}*** %1, align 8
%4 = getelementptr inbounds [5 x {}*], [5 x {}*]* %gcframe13, i64 0, i64 1
%5 = bitcast {}** %4 to {}***
store {}** %3, {}*** %5, align 8
%6 = bitcast {}*** %1 to {}***
store {}** %gcframe13.sub, {}*** %6, align 8
%7 = bitcast {}* %"rt::RefValue" to {}**
%.unpack = load {}*, {}** %7, align 8
%.not = icmp eq {}* %.unpack, null
br i1 %.not, label %fail, label %pass
fail: ; preds = %top
call void @ijl_throw({}* inttoptr (i64 5353759904 to {}*))
unreachable
pass: ; preds = %top
%8 = bitcast {}* %"rt::RefValue" to [3 x {}*]*
%.elt2 = getelementptr inbounds [3 x {}*], [3 x {}*]* %8, i64 0, i64 1
%.unpack3 = load {}*, {}** %.elt2, align 8
%.elt4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %8, i64 0, i64 2
%.unpack5 = load {}*, {}** %.elt4, align 8
%9 = getelementptr inbounds [5 x {}*], [5 x {}*]* %gcframe13, i64 0, i64 4
store {}* %.unpack, {}** %9, align 16
%10 = getelementptr inbounds [5 x {}*], [5 x {}*]* %gcframe13, i64 0, i64 3
store {}* %.unpack5, {}** %10, align 8
%11 = getelementptr inbounds [5 x {}*], [5 x {}*]* %gcframe13, i64 0, i64 2
store {}* %.unpack3, {}** %11, align 16
; └└
%ptls_field14 = getelementptr inbounds {}**, {}*** %1, i64 2
%12 = bitcast {}*** %ptls_field14 to i8**
%ptls_load1516 = load i8*, i8** %12, align 8
%box = call noalias nonnull dereferenceable(32) {}* @ijl_gc_pool_alloc(i8* %ptls_load1516, i32 1184, i32 32) #7
%13 = bitcast {}* %box to i64*
%14 = getelementptr inbounds i64, i64* %13, i64 -1
store atomic i64 5280480960, i64* %14 unordered, align 8
%15 = bitcast {}* %box to [3 x {}*]*
%.repack = bitcast {}* %box to {}**
store {}* %.unpack, {}** %.repack, align 8
%.repack8 = getelementptr inbounds [3 x {}*], [3 x {}*]* %15, i64 0, i64 1
store {}* %.unpack3, {}** %.repack8, align 8
%.repack10 = getelementptr inbounds [3 x {}*], [3 x {}*]* %15, i64 0, i64 2
store {}* %.unpack5, {}** %.repack10, align 8
store {}* %box, {}** %11, align 16
%16 = call i8 @j_sym_in_1643({}* inttoptr (i64 4336660832 to {}*), {}* nonnull readonly %box)
%17 = load {}*, {}** %4, align 8
%18 = bitcast {}*** %1 to {}**
store {}* %17, {}** %18, align 8
ret i8 %16
} In particular, this seems to be the source of allocation Alternatively we can remove the |
We could potentially consider adjusting the specsig codegen ABI for |
That seems like what the ABI was supposed to be for this. Why is it getting a specsig for the Vararg, when non-specsig should have "done the right thing" instead. |
We have special handling for a vararg signature, but not for tuples of unknown size. |
Are there any plans to fix this in Julia v1.10.0 or do we need to introduce workarounds in our packages? |
A 10x regression seems like it should be on the milestone |
Thanks for adding it to the milestone! |
For 1.10 we can remove the |
Backported PRs: - [x] #48625 <!-- add replace(io, str, patterns...) --> - [x] #48387 <!-- Improve documentation of sort-related functions --> - [x] #48363 <!-- Revise sort.md and docstrings in sort.jl --> - [x] #48977 <!-- Update SparseArrays.jl stdlib for SuiteSparse 7 --> - [x] #50719 <!-- fix `CyclePadding(::DataType)` --> - [x] #50694 <!-- inference: permit recursive type traits --> - [x] #50860 <!-- Add `Base.get_extension` to docs/API --> - [x] #50594 <!-- Disallow non-index Integer types in isassigned --> - [x] #50802 <!-- Makes IntrusiveLinkedListSynchronized mutable to avoid allocation on poptask --> - [x] #50858 <!-- Add a `threadpool` parameter to `Channel` constructor --> - [x] #50874 <!-- Restrict COFF to a single thread when symbol count is high --> - [x] #50822 <!-- Add default method for setmodifiers! --> - [x] #50730 <!-- Fix integer overflow in `isapprox` --> - [x] #50850 <!-- Remove weird Rational dispatch and add pi functions to list --> - [x] #50809 <!-- Limit type-printing in MethodError --> - [x] #50915 <!-- Add note the `Task` about sticky bit --> - [x] #50929 <!-- when widening tuple types in tmerge, only widen the complex parts --> - [x] #50928 <!-- Bump JuliaSyntax to 0.4.6 --> - [x] #50959 <!-- Update libssh2 patches --> - [x] #50823 <!-- Make ranges more robust with unsigned indexes. --> - [x] #48542 <!-- Add docs on task-specific buffering using multithreading --> - [x] #50912 <!-- Separate foreign threads into a :foreign threadpool --> - [x] #51010 <!-- Add ORIGIN to SuiteSparse rpath on Linux/FreeBSD --> - [x] #50753 <!-- cat: remove unused promote_eltype methods that confuse inference --> - [x] #51027 <!-- Implement realloc accounting correctly --> - [x] #51019 <!-- fix a case of potentially use of undefined variable when handling error in distributed message processing --> - [x] #51039 <!-- Revert "Optimize findall(f, ::AbstractArray{Bool}) (#42202)" --> - [x] #51036 <!-- add missing invoke edge for nospecialize targets --> - [x] #51042 <!-- inference: fix return_type_tfunc modeling of concrete functions --> - [x] #51114 <!-- Workaround upstream FreeBSD issue #272992 --> - [x] #50892 <!-- Add `JL_DLLIMPORT` to `small_typeof` declaration --> - [x] #51154 <!-- broadcast: use recursion rather than ntuple to map over a tuple --> - [x] #51153 <!-- fix debug typo in "add missing invoke edge for nospecialize targets (#51036)" --> - [x] #51222 <!-- Check again if the tty is open inside the IO lock --> - [x] #51236 <!-- Add lock around uv_unref during init --> - [x] #51243 <!-- GMP: Gracefully handle more overflows. --> - [x] #51254 <!-- Ryu: make sure adding zeros does not overwrite trailing dot --> - [x] #51175 <!-- shorten stale_age for cachefile lock --> - [x] #51300 <!-- fix method definition error for bad vararg --> - [x] #51307 <!-- fix force-throw ctrl-C on Windows --> - [x] #51303 <!-- ensure revising structs is safe --> - [x] #51393 - [x] #51403 Need manual backport: - [x] #51009 <!-- fix #50562, regression in `in` of tuple of Symbols --> - [x] #51053 <!-- Bump Statistics stdlib --> - [x] #51013 <!-- fix #50709, type bound error due to free typevar in sparam env --> - [x] #51305 <!-- reduce test time for rounding and floatfuncs --> Contains multiple commits, manual intervention needed: - [ ] #50663 <!-- Fix Expr(:loopinfo) codegen --> - [ ] #51035 <!-- refactor GC scanning code to reflect jl_binding_t are now first class --> - [ ] #51092 <!-- inference: fix bad effects for recursion --> - [x] #51247 <!-- Check if malloc has succeeded before incrementing gc stats --> - [x] #51294 <!-- use LibGit2_jll for LibGit2 library --> Non-merged PRs with backport label: - [ ] #51132 <!-- Handle `AbstractQ` in concatenation --> - [x] #51029 <!-- testdefs: make sure that if a test set changes the active project, they change it back when they're done --> - [ ] #50919 <!-- Code loading: do the "skipping mtime check for stdlib" check regardless of the value of `ispath(f)` --> - [ ] #50824 <!-- Add some aliasing warnings to docstrings for mutating functions --> - [x] #50385 <!-- Precompile pidlocks: add to NEWS and docs --> - [ ] #49805 <!-- Limit TimeType subtraction to AbstractDateTime -->
This issue was fixed. Closing. |
We have some code in Trixi.jl that checks
where,
secondary_indices
is a tuple ofSymbol
s, see https://github.com/trixi-framework/Trixi.jl/blob/07981c3e50943a9bdb1a845f918a4e8831714338/src/solvers/dgsem_p4est/dg_2d.jl#L148We observed a significantly increased amount of allocations when checking CI with Julia v1.10.0-alpha1 in trixi-framework/Trixi.jl#1562. Locally, I checked that the allocations are gone when modifying the code above to
The text was updated successfully, but these errors were encountered: