performance regression: Multiple allocations with "--check-bounds=no" on map after 1.9 #50110

dpinol · 2023-06-08T16:02:45Z

This always reports 1 allocation in julia <1.9

h(x) = (2,)
f(v) = @time Base.map(h, v)
f([2])
  0.000003 seconds (1 allocation: 64 bytes)
1-element Vector{Tuple{Int64}}:
 (2,)

But on julia 1.9 or master, when using --check-bounds=no it reports 6 allocations

h(x) = (2,)
f(v) = @time Base.map(h, v)
f([2])
  0.000020 seconds (6 allocations: 288 bytes)
1-element Vector{Tuple{Int64}}:
 (2,)

oscardssmith · 2023-06-08T16:10:10Z

This theoretically should be fixed by #50107. That said, see #48245 for why --checkbounds=no is probably a bad idea.

@overlay

In 1.9, `--check-bounds=no` has started causing significant performance regressions (e.g. #50110). This is because we switched a number of functions that used to be `@pure` to new effects-based infrastructure, which very closely tracks the the legality conditions for concrete evaluation. Unfortunately, disabling bounds checking completely invalidates all prior legality analysis, so the only realistic path we have is to completely disable it. In general, we are learning that these kinds of global make-things-faster-but-unsafe flags are highly problematic for a language for several reasons: - Code is written with the assumption of a particular mode being chosen, so it is in general not possible or unsafe to compose libraries (which in a language like julia is a huge problem). - Unsafe semantics are often harder for the compiler to reason about, causing unexpected performance issues (although the 1.9 --check-bounds=no issues are worse than just disabling concrete eval for things that use bounds checking) In general, I'd like to remove the `--check-bounds=` option entirely (#48245), but that proposal has encountered major opposition. This PR implements an alternative proposal: We introduce a new function `Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is passed the result of `Expr(:boundscheck)` (which is now purely determined by the inliner based on `@inbounds`, without regard for the command line flag). In this proposal, what the command line flag does is simply redefine this function to either `true` or `false` (unconditionally) depending on the value of the flag. Of course, this causes massive amounts of recompilation, but I think this can be addressed by adding logic to loading that loads a pkgimage with appropriate definitions to cure the invalidations. The special logic we have now now to take account of the --check-bounds flag in .ji selection, would be replaced by automatically injecting the special pkgimage as a dependency to every loaded image. This part isn't implemented in this PR, but I think it's reasonable to do. I think with that, the `--check-bounds` flag remains functional, while having much more well defined behavior, as it relies on the standard world age mechanisms. A major benefit of this approach is that it can be scoped appropriately using overlay tables. For exmaple: ``` julia> using CassetteOverlay julia> @MethodTable AssumeInboundsTable; julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false; julia> assume_inbounds = @overlaypass AssumeInboundsTable julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically ``` Similar logic applies to GPUCompiler, which already supports overlay tables.

@overlay

In 1.9, `--check-bounds=no` has started causing significant performance regressions (e.g. #50110). This is because we switched a number of functions that used to be `@pure` to new effects-based infrastructure, which very closely tracks the the legality conditions for concrete evaluation. Unfortunately, disabling bounds checking completely invalidates all prior legality analysis, so the only realistic path we have is to completely disable it. In general, we are learning that these kinds of global make-things-faster-but-unsafe flags are highly problematic for a language for several reasons: - Code is written with the assumption of a particular mode being chosen, so it is in general not possible or unsafe to compose libraries (which in a language like julia is a huge problem). - Unsafe semantics are often harder for the compiler to reason about, causing unexpected performance issues (although the 1.9 --check-bounds=no issues are worse than just disabling concrete eval for things that use bounds checking) In general, I'd like to remove the `--check-bounds=` option entirely (#48245), but that proposal has encountered major opposition. This PR implements an alternative proposal: We introduce a new function `Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is passed the result of `Expr(:boundscheck)` (which is now purely determined by the inliner based on `@inbounds`, without regard for the command line flag). In this proposal, what the command line flag does is simply redefine this function to either `true` or `false` (unconditionally) depending on the value of the flag. Of course, this causes massive amounts of recompilation, but I think this can be addressed by adding logic to loading that loads a pkgimage with appropriate definitions to cure the invalidations. The special logic we have now now to take account of the --check-bounds flag in .ji selection, would be replaced by automatically injecting the special pkgimage as a dependency to every loaded image. This part isn't implemented in this PR, but I think it's reasonable to do. I think with that, the `--check-bounds` flag remains functional, while having much more well defined behavior, as it relies on the standard world age mechanisms. A major benefit of this approach is that it can be scoped appropriately using overlay tables. For exmaple: ``` julia> using CassetteOverlay julia> @MethodTable AssumeInboundsTable; julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false; julia> assume_inbounds = @overlaypass AssumeInboundsTable julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically ``` Similar logic applies to GPUCompiler, which already supports overlay tables.

@overlay

In 1.9, `--check-bounds=no` has started causing significant performance regressions (e.g. #50110). This is because we switched a number of functions that used to be `@pure` to new effects-based infrastructure, which very closely tracks the the legality conditions for concrete evaluation. Unfortunately, disabling bounds checking completely invalidates all prior legality analysis, so the only realistic path we have is to completely disable it. In general, we are learning that these kinds of global make-things-faster-but-unsafe flags are highly problematic for a language for several reasons: - Code is written with the assumption of a particular mode being chosen, so it is in general not possible or unsafe to compose libraries (which in a language like julia is a huge problem). - Unsafe semantics are often harder for the compiler to reason about, causing unexpected performance issues (although the 1.9 --check-bounds=no issues are worse than just disabling concrete eval for things that use bounds checking) In general, I'd like to remove the `--check-bounds=` option entirely (#48245), but that proposal has encountered major opposition. This PR implements an alternative proposal: We introduce a new function `Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is passed the result of `Expr(:boundscheck)` (which is now purely determined by the inliner based on `@inbounds`, without regard for the command line flag). In this proposal, what the command line flag does is simply redefine this function to either `true` or `false` (unconditionally) depending on the value of the flag. Of course, this causes massive amounts of recompilation, but I think this can be addressed by adding logic to loading that loads a pkgimage with appropriate definitions to cure the invalidations. The special logic we have now now to take account of the --check-bounds flag in .ji selection, would be replaced by automatically injecting the special pkgimage as a dependency to every loaded image. This part isn't implemented in this PR, but I think it's reasonable to do. I think with that, the `--check-bounds` flag remains functional, while having much more well defined behavior, as it relies on the standard world age mechanisms. A major benefit of this approach is that it can be scoped appropriately using overlay tables. For exmaple: ``` julia> using CassetteOverlay julia> @MethodTable AssumeInboundsTable; julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false; julia> assume_inbounds = @overlaypass AssumeInboundsTable julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically ``` Similar logic applies to GPUCompiler, which already supports overlay tables.

@overlay

In 1.9, `--check-bounds=no` has started causing significant performance regressions (e.g. #50110). This is because we switched a number of functions that used to be `@pure` to new effects-based infrastructure, which very closely tracks the the legality conditions for concrete evaluation. Unfortunately, disabling bounds checking completely invalidates all prior legality analysis, so the only realistic path we have is to completely disable it. In general, we are learning that these kinds of global make-things-faster-but-unsafe flags are highly problematic for a language for several reasons: - Code is written with the assumption of a particular mode being chosen, so it is in general not possible or unsafe to compose libraries (which in a language like julia is a huge problem). - Unsafe semantics are often harder for the compiler to reason about, causing unexpected performance issues (although the 1.9 --check-bounds=no issues are worse than just disabling concrete eval for things that use bounds checking) In general, I'd like to remove the `--check-bounds=` option entirely (#48245), but that proposal has encountered major opposition. This PR implements an alternative proposal: We introduce a new function `Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is passed the result of `Expr(:boundscheck)` (which is now purely determined by the inliner based on `@inbounds`, without regard for the command line flag). In this proposal, what the command line flag does is simply redefine this function to either `true` or `false` (unconditionally) depending on the value of the flag. Of course, this causes massive amounts of recompilation, but I think this can be addressed by adding logic to loading that loads a pkgimage with appropriate definitions to cure the invalidations. The special logic we have now now to take account of the --check-bounds flag in .ji selection, would be replaced by automatically injecting the special pkgimage as a dependency to every loaded image. This part isn't implemented in this PR, but I think it's reasonable to do. I think with that, the `--check-bounds` flag remains functional, while having much more well defined behavior, as it relies on the standard world age mechanisms. A major benefit of this approach is that it can be scoped appropriately using overlay tables. For exmaple: ``` julia> using CassetteOverlay julia> @MethodTable AssumeInboundsTable; julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false; julia> assume_inbounds = @overlaypass AssumeInboundsTable julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically ``` Similar logic applies to GPUCompiler, which already supports overlay tables.

Keno mentioned this issue Jun 20, 2023

RFC: A path forward on --check-bounds #50239

Closed

aplavin mentioned this issue Jun 26, 2023

add regressions to changelog (NEWS.md) #49883

Merged

brenhinkeller added the performance Must go faster label Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance regression: Multiple allocations with "--check-bounds=no" on map after 1.9 #50110

performance regression: Multiple allocations with "--check-bounds=no" on map after 1.9 #50110

dpinol commented Jun 8, 2023

oscardssmith commented Jun 8, 2023

performance regression: Multiple allocations with "--check-bounds=no" on map after 1.9 #50110

performance regression: Multiple allocations with "--check-bounds=no" on map after 1.9 #50110

Comments

dpinol commented Jun 8, 2023

oscardssmith commented Jun 8, 2023