-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pullback on mean() gives illegal memory access code 700 #1473
Comments
I encounter the same error when trying the above example: ERROR: WARNING: Error while freeing DeviceBuffer(400 bytes at 0x0000000205200a00):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
[2] check
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:34 [inlined]
[3] cuMemFreeAsync
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\utils\call.jl:26 [inlined]
[4] #free#2
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\memory.jl:97 [inlined]
[5] free
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\memory.jl:92 [inlined]
[6] #actual_free#1001
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:78 [inlined]
[7] actual_free
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:75 [inlined]
[8] #_free#1026
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:506 [inlined]
[9] _free
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:493 [inlined]
[10] macro expansion
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:478 [inlined]
[11] macro expansion
@ .\timing.jl:393 [inlined]
[12] #free#1025
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:477 [inlined]
[13] free
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:466 [inlined]
[14] (::CUDA.var"#1032#1033"{CUDA.Mem.DeviceBuffer, Bool})()
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\array.jl:101
[15] #context!#915
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\state.jl:170 [inlined]
[16] context!
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\state.jl:165 [inlined]
[17] _free_buffer(buf::CUDA.Mem.DeviceBuffer, early::Bool)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\array.jl:89
[18] release(rc::GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer}, args::Bool)
@ GPUArrays C:\Users\gerharddorn\.julia\packages\GPUArrays\dAUOE\src\host\abstractarray.jl:42
[19] unsafe_free!
@ C:\Users\gerharddorn\.julia\packages\GPUArrays\dAUOE\src\host\abstractarray.jl:90 [inlined]
[20] unsafe_finalize!(xs::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\array.jl:113
[21] show_exception_stack(io::IOContext{Base.TTY}, stack::Base.ExceptionStack)
@ Base .\errorshow.jl:895
[22] display_error(io::IOContext{Base.TTY}, stack::Base.ExceptionStack)
@ Base .\client.jl:111
[23] #invokelatest#2
@ .\essentials.jl:819 [inlined]
[24] invokelatest
@ .\essentials.jl:816 [inlined]
[25] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:300
[26] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:287
[27] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
[28] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:285
[29] (::REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:899
[30] #invokelatest#2
@ .\essentials.jl:819 [inlined]
[31] invokelatest
@ .\essentials.jl:816 [inlined]
[32] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
@ REPL.LineEdit C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\LineEdit.jl:2647
[33] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:1300
[34] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
@ REPL .\task.jl:514
error in running finalizer: CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))
throw_api_error at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
check at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:34 [inlined]
cuMemHostUnregister at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\utils\call.jl:26 [inlined]
unregister at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\memory.jl:193 [inlined]
#21 at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\memory.jl:701 [inlined]
#context!#915 at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\state.jl:170 [inlined]
context! at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\state.jl:165 [inlined]
macro expansion at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\memory.jl:700 [inlined]
macro expansion at .\lock.jl:267 [inlined]
__unpin at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\memory.jl:694
#1081 at C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:180
unknown function (ip: 0000029d478cdc66)
run_finalizer at C:/workdir/src\gc.c:417
jl_gc_run_finalizers_in_list at C:/workdir/src\gc.c:507
run_finalizers at C:/workdir/src\gc.c:553
jl_mutex_unlock at C:/workdir/src\julia_locks.h:81 [inlined]
jl_generate_fptr_impl at C:/workdir/src\jitlayers.cpp:467
jl_compile_method_internal at C:/workdir/src\gf.c:2348
jl_compile_method_internal at C:/workdir/src\gf.c:2241 [inlined]
_jl_invoke at C:/workdir/src\gf.c:2750 [inlined]
ijl_apply_generic at C:/workdir/src\gf.c:2940
show_exception_stack at .\errorshow.jl:895
display_error at .\client.jl:111
jfptr_display_error_47944.clone_1 at C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:1880 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:819 [inlined]
invokelatest at .\essentials.jl:816 [inlined]
print_response at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:300
#57 at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:287
jfptr_YY.57_61235.clone_1 at C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\lib\julia\sys.dll (unknown line)
with_repl_linfo at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
jfptr_with_repl_linfo_61240.clone_1 at C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\lib\julia\sys.dll (unknown line)
print_response at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:285
do_respond at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:899
jfptr_do_respond_61066.clone_1 at C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:1880 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:819 [inlined]
invokelatest at .\essentials.jl:816 [inlined]
run_interface at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\LineEdit.jl:2647
jfptr_run_interface_61442.clone_1 at C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\lib\julia\sys.dll (unknown line)
run_frontend at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:1300
#62 at .\task.jl:514
jfptr_YY.62_61043.clone_1 at C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:1880 [inlined]
start_task at C:/workdir/src\task.c:1092
WARNING: Error while freeing DeviceBuffer(400 bytes at 0x0000000205200800):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
[2] check
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:34 [inlined]
[3] cuMemFreeAsync
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\utils\call.jl:26 [inlined]
[4] #free#2
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\memory.jl:97 [inlined]
[5] free
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\memory.jl:92 [inlined]
[6] #actual_free#1001
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:78 [inlined]
[7] actual_free
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:75 [inlined]
[8] #_free#1026
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:506 [inlined]
[9] _free
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:493 [inlined]
[10] macro expansion
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:478 [inlined]
[11] macro expansion
@ .\timing.jl:393 [inlined]
[12] #free#1025
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:477 [inlined]
[13] free
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\pool.jl:466 [inlined]
[14] (::CUDA.var"#1032#1033"{CUDA.Mem.DeviceBuffer, Bool})()
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\array.jl:101
[15] #context!#915
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\state.jl:170 [inlined]
[16] context!
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\state.jl:165 [inlined]
[17] _free_buffer(buf::CUDA.Mem.DeviceBuffer, early::Bool)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\array.jl:89
[18] release(rc::GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer}, args::Bool)
@ GPUArrays C:\Users\gerharddorn\.julia\packages\GPUArrays\dAUOE\src\host\abstractarray.jl:42
[19] unsafe_free!
@ C:\Users\gerharddorn\.julia\packages\GPUArrays\dAUOE\src\host\abstractarray.jl:90 [inlined]
[20] unsafe_finalize!(xs::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\array.jl:113
[21] show_exception_stack(io::IOContext{Base.TTY}, stack::Base.ExceptionStack)
@ Base .\errorshow.jl:895
[22] display_error(io::IOContext{Base.TTY}, stack::Base.ExceptionStack)
@ Base .\client.jl:111
[23] #invokelatest#2
@ .\essentials.jl:819 [inlined]
[24] invokelatest
@ .\essentials.jl:816 [inlined]
[25] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:300
[26] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:287
[27] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
[28] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:285
[29] (::REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:899
[30] #invokelatest#2
@ .\essentials.jl:819 [inlined]
[31] invokelatest
@ .\essentials.jl:816 [inlined]
[32] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
@ REPL.LineEdit C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\LineEdit.jl:2647
[33] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
@ REPL C:\Users\gerharddorn\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\share\julia\stdlib\v1.9\REPL\src\REPL.jl:1300
[34] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
@ REPL .\task.jl:514
CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
[2] isdone
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\stream.jl:111 [inlined]
[3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:79
[4] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:171
[5] device_synchronize()
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:169
[6] top-level scope
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\initialization.jl:210
caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
[2] nonblocking_synchronize(val::CuContext)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:163
[3] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:174
[4] device_synchronize
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:169 [inlined]
[5] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\module.jl:40
[6] CuModule
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\lib\cudadrv\module.jl:23 [inlined]
[7] link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry, :external_gvars), Tuple{Vector{UInt8}, String, Vector{String}}})
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\compiler\compilation.jl:365
[8] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler C:\Users\gerharddorn\.julia\packages\GPUCompiler\U36Ed\src\execution.jl:132
[9] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler C:\Users\gerharddorn\.julia\packages\GPUCompiler\U36Ed\src\execution.jl:103
[10] macro expansion
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:382 [inlined]
[11] macro expansion
@ .\lock.jl:267 [inlined]
[12] cufunction(f::GPUArrays.var"#broadcast_kernel#38", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:377
[13] cufunction
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:374 [inlined]
[14] macro expansion
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:104 [inlined]
[15] #launch_heuristic#1120
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:17 [inlined]
[16] launch_heuristic
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:15 [inlined]
[17] _copyto!
@ C:\Users\gerharddorn\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:70 [inlined]
[18] copyto!
@ C:\Users\gerharddorn\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:51 [inlined]
[19] copy
@ C:\Users\gerharddorn\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:42 [inlined]
[20] materialize
@ .\broadcast.jl:873 [inlined]
[21] broadcast_preserving_zero_d
@ .\broadcast.jl:862 [inlined]
[22] -(A::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ Base .\abstractarraymath.jl:218
[23] _minus
@ C:\Users\gerharddorn\.julia\packages\Zygote\YYT6v\src\lib\broadcast.jl:89 [inlined]
[24] #1185
@ C:\Users\gerharddorn\.julia\packages\Zygote\YYT6v\src\lib\broadcast.jl:86 [inlined]
[25] #3770#back
@ C:\Users\gerharddorn\.julia\packages\ZygoteRules\4nXuu\src\adjoint.jl:71 [inlined]
[26] Pullback
@ .\REPL[6]:1 [inlined]
[27] (::Zygote.Pullback{Tuple{var"#3#4", CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#1990#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.var"#3770#back#1189"{Zygote.var"#1185#1188"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{ChainRules.var"#mean_pullback#1821"{Int64, ChainRules.var"#sum_pullback#1633"{Colon, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float32, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}}}}}}})(Δ::Float32)
@ Zygote C:\Users\gerharddorn\.julia\packages\Zygote\YYT6v\src\compiler\interface2.jl:0
[28] (::Zygote.var"#75#76"{Zygote.Pullback{Tuple{var"#3#4", CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#1990#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.var"#3770#back#1189"{Zygote.var"#1185#1188"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{ChainRules.var"#mean_pullback#1821"{Int64, ChainRules.var"#sum_pullback#1633"{Colon, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float32, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}}}}}}}})(Δ::Float32)
@ Zygote C:\Users\gerharddorn\.julia\packages\Zygote\YYT6v\src\compiler\interface.jl:45
[29] top-level scope
@ REPL[7]:1
[30] top-level scope
@ C:\Users\gerharddorn\.julia\packages\CUDA\YIj5X\src\initialization.jl:208 |
This is tricky because I'm not able to reproduce the error on my end. Can either of you post the output of |
Sure ;) julia> -x
100-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
1.1735989
⋮
-0.33174348
julia> x .- y
100-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
-0.7638358
⋮
2.1407743
julia> sum(x .- y)
-4.3421297f0
julia> mean(x)
0.03391623f0 julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa29 (2023-11-14 08:46 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
Threads: 10 on 8 virtual cores
Environment:
JULIA_NUM_THREADS = auto julia> CUDA.versioninfo()
CUDA runtime 12.3, artifact installation
CUDA driver 12.0
NVIDIA driver 528.49.0
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+528.49
Julia packages:
- CUDA: 5.1.1
- CUDA_Driver_jll: 0.7.0+0
- CUDA_Runtime_jll: 0.10.1+0
Toolchain:
- Julia: 1.9.4
- LLVM: 14.0.6
1 device:
0: NVIDIA T500 (sm_75, 3.777 GiB / 4.000 GiB available) |
Hmm, nothing looks off to me. @maleadt would this be enough for you to work with MWE-wise? Or would you like me to try to make one without Zygote? That may take a couple rounds of back-and-forth since I'd need others to run further MWEs to see if they throw the same error. |
More details from my side: All basic operations work fine.
|
I can't reproduce this either. A couple of things you could try:
|
i tried the makro and got the following output: julia> @device_code_llvm dump_module=true grads = back(one(loss))
; PTX CompilerJob of MethodInstance for (::GPUArrays.var"#broadcast_kernel#38")(::CUDA.CuKernelContext, ::CuDeviceVector{Float32, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, ComposedFunction{typeof(last), typeof(tuple)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, CUDA.CuRefPointer{Float32}}}, ::Int64) for sm_61
; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #0
; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #0
; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x() #0
; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() #0
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:59 within `broadcast_kernel`
; Function Attrs: uwtable
define ptx_kernel void @_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li1ELi1EE11BroadcastedI12CuArrayStyleILi1EE5TupleI5OneToI5Int64EE16ComposedFunctionI4last5tupleES4_I8ExtrudedIS0_IS1_Li1ELi1EES4_I4BoolES4_IS6_EE12CuRefPointerIS1_EEES6_({ i64, i32 } %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, { { { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }, [1 x i64] }, [1 x [1 x i64]] } %1, i64 signext %2) local_unnamed_addr #1 {
conversion:
%.fca.3.extract = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 3
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:63 within `broadcast_kernel`
; ┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:66 within `macro expansion`
; │┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:44 within `linear_index`
; ││┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:20 within `global_index`
; │││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:40 within `threadidx`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:92 within `#threadIdx`
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:46 within `threadIdx_x`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `_index`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%3 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
; ││││││└└
; ││││││┌ @ int.jl:87 within `+`
%4 = add nuw nsw i32 %3, 1
; │││└└└└
; │││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:38 within `blockidx`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:78 within `#blockIdx`
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:56 within `blockIdx_x`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `_index`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%5 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
; │││└└└└└
; │││┌ @ int.jl:1042 within `-` @ int.jl:86
%6 = zext i32 %5 to i64
; │││└
; │││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:39 within `blockdim`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:85 within `#blockDim`
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:51 within `blockDim_x`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `_index`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%7 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
; │││└└└└└
; │││┌ @ int.jl:1040 within `*`
; ││││┌ @ int.jl:523 within `rem`
%8 = zext i32 %7 to i64
; ││││└
; ││││ @ int.jl:1042 within `*` @ int.jl:88
%9 = mul nuw nsw i64 %6, %8
; │││└
; │││┌ @ int.jl:1040 within `+`
; ││││┌ @ int.jl:523 within `rem`
%10 = zext i32 %4 to i64
; ││││└
; ││││ @ int.jl:1042 within `+` @ int.jl:87
%11 = add nuw nsw i64 %9, %10
; ││└└
; ││┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:29 within `global_size`
; │││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:41 within `griddim`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:71 within `#gridDim`
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:61 within `gridDim_x`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `_index`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%12 = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x()
; │││└└└└└
; │││┌ @ int.jl:88 within `*`
%13 = mul i32 %12, %7
; ││└└
; ││┌ @ int.jl:1040 within `*`
; │││┌ @ int.jl:523 within `rem`
%14 = sext i32 %13 to i64
; └└└└
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:61 within `broadcast_kernel`
; ┌ @ int.jl:83 within `<`
%.not11 = icmp sgt i64 %2, 0
; └
br i1 %.not11, label %L5.lr.ph, label %common.ret
L5.lr.ph: ; preds = %conversion
%.fca.0.1.0.extract = extractvalue { { { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }, [1 x i64] }, [1 x [1 x i64]] } %1, 0, 1, 0
%.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0
%15 = inttoptr i64 %.fca.0.1.0.extract to float*
%16 = bitcast i8 addrspace(1)* %.fca.0.extract to float addrspace(1)*
br label %L5
L5: ; preds = %L53, %L5.lr.ph
%value_phi12 = phi i64 [ 0, %L5.lr.ph ], [ %19, %L53 ]
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:63 within `broadcast_kernel`
; ┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:66 within `macro expansion`
; │┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:44 within `linear_index`
; ││┌ @ int.jl:1042 within `*` @ int.jl:88
%17 = mul i64 %value_phi12, %14
; ││└
; ││┌ @ int.jl:87 within `+`
%18 = add i64 %11, %17
; │└└
; │ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:67 within `macro expansion`
; │┌ @ operators.jl:369 within `>`
; ││┌ @ int.jl:83 within `<`
%.not10 = icmp slt i64 %.fca.3.extract, %18
; │└└
br i1 %.not10, label %common.ret, label %L53
common.ret: ; preds = %L53, %L5, %conversion
; └
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl within `broadcast_kernel`
ret void
L53: ; preds = %L5
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:62 within `broadcast_kernel`
; ┌ @ int.jl:87 within `+`
%19 = add nuw nsw i64 %value_phi12, 1
; └
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:64 within `broadcast_kernel`
; ┌ @ broadcast.jl:610 within `getindex`
; │┌ @ broadcast.jl:655 within `_broadcast_getindex`
; ││┌ @ broadcast.jl:679 within `_getindex` @ broadcast.jl:680
; │││┌ @ broadcast.jl:630 within `_broadcast_getindex`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:169 within `getindex`
; │││││┌ @ pointer.jl:111 within `unsafe_load` @ pointer.jl:111
%20 = load float, float* %15, align 1
; └└└└└└
; ┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:179 within `setindex!` @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:166
; │┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:127 within `#arrayset`
; ││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:134 within `arrayset_bits`
; │││┌ @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\pointer.jl:88 within `unsafe_store!`
; ││││┌ @ none within `pointerset`
; │││││┌ @ none within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
; ││││││┌ @ int.jl:86 within `-`
%21 = add i64 %18, -1
; ││││││└
%22 = getelementptr inbounds float, float addrspace(1)* %16, i64 %21
store float %20, float addrspace(1)* %22, align 4
; └└└└└└
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:61 within `broadcast_kernel`
; ┌ @ int.jl:83 within `<`
%exitcond.not = icmp eq i64 %19, %2
; └
br i1 %exitcond.not, label %common.ret, label %L5
}
attributes #0 = { nounwind readnone speculatable }
attributes #1 = { uwtable "frame-pointer"="all" }
!llvm.module.flags = !{!0, !1}
!julia.kernel = !{!2}
!nvvm.annotations = !{!3}
!0 = !{i32 2, !"Dwarf Version", i32 2}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{void ({ i64, i32 }, { i8 addrspace(1)*, i64, [1 x i64], i64 }, { { { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }, [1 x i64] }, [1 x [1 x i64]] }, i64)* @_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li1ELi1EE11BroadcastedI12CuArrayStyleILi1EE5TupleI5OneToI5Int64EE16ComposedFunctionI4last5tupleES4_I8ExtrudedIS0_IS1_Li1ELi1EES4_I4BoolES4_IS6_EE12CuRefPointerIS1_EEES6_}
!3 = !{void ({ i64, i32 }, { i8 addrspace(1)*, i64, [1 x i64], i64 }, { { { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }, [1 x i64] }, [1 x [1 x i64]] }, i64)* @_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li1ELi1EE11BroadcastedI12CuArrayStyleILi1EE5TupleI5OneToI5Int64EE16ComposedFunctionI4last5tupleES4_I8ExtrudedIS0_IS1_Li1ELi1EES4_I4BoolES4_IS6_EE12CuRefPointerIS1_EEES6_, !"kernel", i32 1}
; PTX CompilerJob of MethodInstance for (::GPUArrays.var"#broadcast_kernel#38")(::CUDA.CuKernelContext, ::CuDeviceVector{Float32, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) for sm_61
; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #0
; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #0
; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x() #0
; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() #0
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:59 within `broadcast_kernel`
; Function Attrs: uwtable
define ptx_kernel void @_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li1ELi1EE11BroadcastedI12CuArrayStyleILi1EE5TupleI5OneToI5Int64EE1_S4_I8ExtrudedIS0_IS1_Li1ELi1EES4_I4BoolES4_IS6_EEEES6_({ i64, i32 } %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, { [1 x { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }], [1 x [1 x i64]] } %1, i64 signext %2) local_unnamed_addr #1 {
conversion:
%.fca.3.extract = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 3
%.fca.0.0.2.0.extract = extractvalue { [1 x { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }], [1 x [1 x i64]] } %1, 0, 0, 2, 0
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:63 within `broadcast_kernel`
; ┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:66 within `macro expansion`
; │┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:44 within `linear_index`
; ││┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:20 within `global_index`
; │││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:40 within `threadidx`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:92 within `#threadIdx`
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:46 within `threadIdx_x`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `_index`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%3 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
; ││││││└└
; ││││││┌ @ int.jl:87 within `+`
%4 = add nuw nsw i32 %3, 1
; │││└└└└
; │││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:38 within `blockidx`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:78 within `#blockIdx`
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:56 within `blockIdx_x`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `_index`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%5 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
; │││└└└└└
; │││┌ @ int.jl:1042 within `-` @ int.jl:86
%6 = zext i32 %5 to i64
; │││└
; │││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:39 within `blockdim`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:85 within `#blockDim`
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:51 within `blockDim_x`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `_index`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%7 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
; │││└└└└└
; │││┌ @ int.jl:1040 within `*`
; ││││┌ @ int.jl:523 within `rem`
%8 = zext i32 %7 to i64
; ││││└
; ││││ @ int.jl:1042 within `*` @ int.jl:88
%9 = mul nuw nsw i64 %6, %8
; │││└
; │││┌ @ int.jl:1040 within `+`
; ││││┌ @ int.jl:523 within `rem`
%10 = zext i32 %4 to i64
; ││││└
; ││││ @ int.jl:1042 within `+` @ int.jl:87
%11 = add nuw nsw i64 %9, %10
; ││└└
; ││┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:29 within `global_size`
; │││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:41 within `griddim`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:71 within `#gridDim`
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:61 within `gridDim_x`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `_index`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\intrinsics\indexing.jl:7 within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%12 = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x()
; │││└└└└└
; │││┌ @ int.jl:88 within `*`
%13 = mul i32 %12, %7
; ││└└
; ││┌ @ int.jl:1040 within `*`
; │││┌ @ int.jl:523 within `rem`
%14 = sext i32 %13 to i64
; └└└└
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:61 within `broadcast_kernel`
; ┌ @ int.jl:83 within `<`
%.not11 = icmp sgt i64 %2, 0
; └
br i1 %.not11, label %L5.lr.ph, label %common.ret
L5.lr.ph: ; preds = %conversion
%.fca.0.0.1.0.extract = extractvalue { [1 x { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }], [1 x [1 x i64]] } %1, 0, 0, 1, 0
%.fca.0.0.0.0.extract = extractvalue { [1 x { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }], [1 x [1 x i64]] } %1, 0, 0, 0, 0
%.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0
%15 = and i8 %.fca.0.0.1.0.extract, 1
%.not10 = icmp eq i8 %15, 0
%16 = bitcast i8 addrspace(1)* %.fca.0.0.0.0.extract to float addrspace(1)*
%17 = bitcast i8 addrspace(1)* %.fca.0.extract to float addrspace(1)*
br i1 %.not10, label %L5.lr.ph.split.us, label %L5
L5.lr.ph.split.us: ; preds = %L5.lr.ph
%18 = add i64 %.fca.0.0.2.0.extract, -1
%19 = getelementptr inbounds float, float addrspace(1)* %16, i64 %18
br label %L5.us
L5.us: ; preds = %L53.us, %L5.lr.ph.split.us
%value_phi12.us = phi i64 [ 0, %L5.lr.ph.split.us ], [ %22, %L53.us ]
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:63 within `broadcast_kernel`
; ┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:66 within `macro expansion`
; │┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:44 within `linear_index`
; ││┌ @ int.jl:1042 within `*` @ int.jl:88
%20 = mul i64 %value_phi12.us, %14
; ││└
; ││┌ @ int.jl:87 within `+`
%21 = add i64 %11, %20
; │└└
; │ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:67 within `macro expansion`
; │┌ @ operators.jl:369 within `>`
; ││┌ @ int.jl:83 within `<`
%.not9.us = icmp slt i64 %.fca.3.extract, %21
; │└└
br i1 %.not9.us, label %common.ret, label %L53.us
L53.us: ; preds = %L5.us
; └
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:62 within `broadcast_kernel`
; ┌ @ int.jl:87 within `+`
%22 = add nuw nsw i64 %value_phi12.us, 1
; └
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:64 within `broadcast_kernel`
; ┌ @ broadcast.jl:610 within `getindex`
; │┌ @ broadcast.jl:655 within `_broadcast_getindex`
; ││┌ @ broadcast.jl:680 within `_getindex`
; │││┌ @ broadcast.jl:649 within `_broadcast_getindex`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:176 within `getindex` @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:164
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:85 within `#arrayref`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:91 within `arrayref_bits`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\pointer.jl:85 within `unsafe_load`
; ││││││││┌ @ none within `pointerref`
; │││││││││┌ @ none within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%23 = load float, float addrspace(1)* %19, align 4
; ││└└└└└└└└
; ││ @ broadcast.jl:656 within `_broadcast_getindex`
; ││┌ @ broadcast.jl:683 within `_broadcast_getindex_evalf`
; │││┌ @ float.jl:406 within `-`
%24 = fneg float %23
; └└└└
; ┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:179 within `setindex!` @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:166
; │┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:127 within `#arrayset`
; ││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:134 within `arrayset_bits`
; │││┌ @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\pointer.jl:88 within `unsafe_store!`
; ││││┌ @ none within `pointerset`
; │││││┌ @ none within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
; ││││││┌ @ int.jl:86 within `-`
%25 = add i64 %21, -1
; ││││││└
%26 = getelementptr inbounds float, float addrspace(1)* %17, i64 %25
store float %24, float addrspace(1)* %26, align 4
; └└└└└└
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:61 within `broadcast_kernel`
; ┌ @ int.jl:83 within `<`
%exitcond13.not = icmp eq i64 %22, %2
; └
br i1 %exitcond13.not, label %common.ret, label %L5.us
L5: ; preds = %L53, %L5.lr.ph
%value_phi12 = phi i64 [ %29, %L53 ], [ 0, %L5.lr.ph ]
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:63 within `broadcast_kernel`
; ┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:66 within `macro expansion`
; │┌ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:44 within `linear_index`
; ││┌ @ int.jl:1042 within `*` @ int.jl:88
%27 = mul i64 %value_phi12, %14
; ││└
; ││┌ @ int.jl:87 within `+`
%28 = add i64 %11, %27
; │└└
; │ @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\device\indexing.jl:67 within `macro expansion`
; │┌ @ operators.jl:369 within `>`
; ││┌ @ int.jl:83 within `<`
%.not9 = icmp slt i64 %.fca.3.extract, %28
; │└└
br i1 %.not9, label %common.ret, label %L53
common.ret: ; preds = %L53, %L5, %L53.us, %L5.us, %conversion
; └
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl within `broadcast_kernel`
ret void
L53: ; preds = %L5
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:62 within `broadcast_kernel`
; ┌ @ int.jl:87 within `+`
%29 = add nuw nsw i64 %value_phi12, 1
; └
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:64 within `broadcast_kernel`
; ┌ @ broadcast.jl:610 within `getindex`
; │┌ @ broadcast.jl:655 within `_broadcast_getindex`
; ││┌ @ broadcast.jl:680 within `_getindex`
; │││┌ @ broadcast.jl:649 within `_broadcast_getindex`
; ││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:176 within `getindex` @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:164
; │││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:85 within `#arrayref`
; ││││││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:91 within `arrayref_bits`
; │││││││┌ @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\pointer.jl:85 within `unsafe_load`
; ││││││││┌ @ none within `pointerref`
; │││││││││┌ @ none within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
; ││││││││││┌ @ int.jl:86 within `-`
%30 = add i64 %28, -1
; ││││││││││└
%31 = getelementptr inbounds float, float addrspace(1)* %16, i64 %30
%32 = load float, float addrspace(1)* %31, align 4
; ││└└└└└└└└
; ││ @ broadcast.jl:656 within `_broadcast_getindex`
; ││┌ @ broadcast.jl:683 within `_broadcast_getindex_evalf`
; │││┌ @ float.jl:406 within `-`
%33 = fneg float %32
; └└└└
; ┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:179 within `setindex!` @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:166
; │┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:127 within `#arrayset`
; ││┌ @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\device\array.jl:134 within `arrayset_bits`
; │││┌ @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\pointer.jl:88 within `unsafe_store!`
; ││││┌ @ none within `pointerset`
; │││││┌ @ none within `macro expansion` @ C:\Users\Gerhard\.julia\packages\LLVM\RpBog\src\interop\base.jl:38
%34 = getelementptr inbounds float, float addrspace(1)* %17, i64 %30
store float %33, float addrspace(1)* %34, align 4
; └└└└└└
; @ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:61 within `broadcast_kernel`
; ┌ @ int.jl:83 within `<`
%exitcond.not = icmp eq i64 %29, %2
; └
br i1 %exitcond.not, label %common.ret, label %L5
}
attributes #0 = { nounwind readnone speculatable }
attributes #1 = { uwtable "frame-pointer"="all" }
!llvm.module.flags = !{!0, !1}
!julia.kernel = !{!2}
!nvvm.annotations = !{!3}
!0 = !{i32 2, !"Dwarf Version", i32 2}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{void ({ i64, i32 }, { i8 addrspace(1)*, i64, [1 x i64], i64 }, { [1 x { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }], [1 x [1 x i64]] }, i64)* @_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li1ELi1EE11BroadcastedI12CuArrayStyleILi1EE5TupleI5OneToI5Int64EE1_S4_I8ExtrudedIS0_IS1_Li1ELi1EES4_I4BoolES4_IS6_EEEES6_}
!3 = !{void ({ i64, i32 }, { i8 addrspace(1)*, i64, [1 x i64], i64 }, { [1 x { { i8 addrspace(1)*, i64, [1 x i64], i64 }, [1 x i8], [1 x i64] }], [1 x [1 x i64]] }, i64)* @_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li1ELi1EE11BroadcastedI12CuArrayStyleILi1EE5TupleI5OneToI5Int64EE1_S4_I8ExtrudedIS0_IS1_Li1ELi1EES4_I4BoolES4_IS6_EEEES6_, !"kernel", i32 1}
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
[2] isdone
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\stream.jl:111 [inlined]
[3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:79
[4] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:171
[5] device_synchronize()
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:169
[6] top-level scope
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\initialization.jl:210
caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
[2] nonblocking_synchronize(val::CuContext)
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:163
[3] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:174
[4] device_synchronize
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:169 [inlined]
[5] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\module.jl:40
[6] CuModule
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\module.jl:23 [inlined]
[7] link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry, :external_gvars), Tuple{Vector{UInt8}, String, Vector{String}}})
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\compiler\compilation.jl:365
[8] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler C:\Users\Gerhard\.julia\packages\GPUCompiler\U36Ed\src\execution.jl:132
[9] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler C:\Users\Gerhard\.julia\packages\GPUCompiler\U36Ed\src\execution.jl:103
[10] macro expansion
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:382 [inlined]
[11] macro expansion
@ .\lock.jl:267 [inlined]
[12] cufunction(f::GPUArrays.var"#broadcast_kernel#38", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:377
[13] cufunction
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:374 [inlined]
[14] macro expansion
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\compiler\execution.jl:104 [inlined]
[15] #launch_heuristic#1120
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:17 [inlined]
[16] launch_heuristic
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\gpuarrays.jl:15 [inlined]
[17] _copyto!
@ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:70 [inlined]
[18] copyto!
@ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:51 [inlined]
[19] copy
@ C:\Users\Gerhard\.julia\packages\GPUArrays\dAUOE\src\host\broadcast.jl:42 [inlined]
[20] materialize
@ .\broadcast.jl:873 [inlined]
[21] broadcast_preserving_zero_d
@ .\broadcast.jl:862 [inlined]
[22] -(A::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ Base .\abstractarraymath.jl:218
[23] _minus
@ C:\Users\Gerhard\.julia\packages\Zygote\YYT6v\src\lib\broadcast.jl:89 [inlined]
[24] #1185
@ C:\Users\Gerhard\.julia\packages\Zygote\YYT6v\src\lib\broadcast.jl:86 [inlined]
[25] #3770#back
@ C:\Users\Gerhard\.julia\packages\ZygoteRules\4nXuu\src\adjoint.jl:71 [inlined]
[26] Pullback
@ .\REPL[6]:1 [inlined]
[27] (::Zygote.Pullback{Tuple{var"#5#6", CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{ChainRules.var"#mean_pullback#1821"{Int64, ChainRules.var"#sum_pullback#1633"{Colon, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float32, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}}}}}, Zygote.var"#3770#back#1189"{Zygote.var"#1185#1188"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#1990#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{}}}})(Δ::Float32)
@ Zygote C:\Users\Gerhard\.julia\packages\Zygote\YYT6v\src\compiler\interface2.jl:0
[28] (::Zygote.var"#75#76"{Zygote.Pullback{Tuple{var"#5#6", CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{ChainRules.var"#mean_pullback#1821"{Int64, ChainRules.var"#sum_pullback#1633"{Colon, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float32, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}}}}}, Zygote.var"#3770#back#1189"{Zygote.var"#1185#1188"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#1990#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{}}}}})(Δ::Float32)
@ Zygote C:\Users\Gerhard\.julia\packages\Zygote\YYT6v\src\compiler\interface.jl:45
[29] top-level scope
@ C:\Users\Gerhard\.julia\packages\GPUCompiler\U36Ed\src\reflection.jl:206
[30] top-level scope
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\initialization.jl:208 |
I also switched to julia> using CUDA
julia> CUDA.version
version versioninfo
julia> CUDA.versioninfo()
CUDA runtime 12.2, artifact installation
CUDA driver 12.0
NVIDIA driver 528.79.0
CUDA libraries:
- CUBLAS: 12.2.5
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.2
- CUSPARSE: 12.1.2
- CUPTI: 20.0.0
- NVML: 12.0.0+528.79
Julia packages:
- CUDA: 5.1.1
- CUDA_Driver_jll: 0.7.0+0
- CUDA_Runtime_jll: 0.10.1+0
Toolchain:
- Julia: 1.9.4
- LLVM: 14.0.6
Preferences:
- CUDA_Runtime_jll.version: 12.2
1 device:
0: NVIDIA GeForce GTX 1050 (sm_61, 3.926 GiB / 4.000 GiB available)
julia> CUDA.run_compute_sanitizer()
Re-starting your active Julia session...
========= COMPUTE-SANITIZER
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.9.4 (2023-11-14)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using CUDA
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6be45c40 -- nvtxGlobals_v3 at C:\Users\Gerhard\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
in expression starting at REPL[1]:1
nvtxGlobals_v3 at C:\Users\Gerhard\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
Allocations: 703421 (Pool: 702539; Big: 882); GC: 1
========= Error: Target application terminated before first instrumented API call
ERROR: failed process: Process(setenv(`'C:\Users\Gerhard\.julia\artifacts\0cdffaf70d865a7149744c4c5670ea6b2145e80d\bin\compute-sanitizer.exe' --tool memcheck --launch-timeout=0 --target-processes=all --report-api-errors=no 'C:\Users\Gerhard\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\bin\julia.exe' -Cnative '-JC:\Users\Gerhard\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\lib\julia\sys.dll' -g1 '--project=C:\Users\Gerhard\.julia\environments\v1.9\Project.toml'`,["WINDIR=C:\\WINDOWS", "PATH=C:\\Users\\Gerhard\\.julia\\artifacts\\0cdffaf70d865a7149744c4c5670ea6b2145e80d\\bin;C:\\Users\\Gerhard\\.julia\\juliaup\\julia-1.9.4+0.x64.w64.mingw32\\bin\\..\\lib\\julia;C:\\Users\\Gerhard\\.julia\\juliaup\\julia-1.9.4+0.x64.w64.mingw32\\bin\\..\\lib;C:\\Users\\Gerhard\\.julia\\juliaup\\julia-1.9.4+0.x64.w64.mingw32\\bin;E:\\Programs\\VM Ware\\bin\\;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files (x86)\\Razer Chroma SDK\\bin;C:\\Program Files\\Razer Chroma SDK\\bin;C:\\Program Files (x86)\\Razer\\ChromaBroadcast\\bin;C:\\Program Files\\Razer\\ChromaBroadcast\\bin;C:\\Program Files\\ImageMagick-6.9.10-Q16;C:\\Program Files (x86)\\Common Files\\Oracle\\Java\\javapath;C:\\ProgramData\\Oracle\\Java\\javapath;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program Files\\MiKTeX 2.9\\miktex\\bin\\x64\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\Android;C:\\Program Files\\MATLAB\\R2019a\\bin;C:\\Program Files\\MATLAB\\R2018b\\bin;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\gs\\gs9.20\\bin;C:\\ad;C:\\Program Files (x86)\\Intel\\iCLS Client\\;C:\\Program Files\\Intel\\iCLS Client\\;C:\\Program Files (x86)\\GNU\\GnuPG\\pub;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\IPT;C:\\Program Files\\Int;C:\\WINDOWS\\system32\\config\\systemprofile\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Program Files\\PuTTY\\;C:\\Program Files (x86)\\Gpg4win\\..\\GnuPG\\bin;C:\\platform-tools\\;C:\\Program Files\\Git\\cmd;C:\\Users\\Gerhard\\AppData\\Roaming\\Python\\Python39\\Scripts;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\PDFtk\\bin\\;C:\\Program Files (x86)\\PDFtk Server\\bin\\;C:\\Program Files (x86)\\GitExtensions\\;C:\\Program Files\\dotnet\\;C:\\Program Files\\ArangoDB3 3.9.3\\usr\\bin;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\Git LFS;C:\\Program Files\\Java\\jdk-20\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Gerhard\\AppData\\Local\\Microsoft\\WindowsApps;C;C:\\Users\\Gerhard\\scoop\\shims;C:\\Users\\Gerhard\\AppData\\Local\\Programs\\Python\\Python39\\Scripts\\;C:\\Users\\Gerhard\\AppData\\Local\\Programs\\Python\\Python39\\;C:\\Program Files\\MATLAB\\R2018a\\bin;C:\\Program Files (x86)\\Intel\\Intel(R) Management E;C:\\Users\\Gerhard\\AppData\\Local\\Programs\\Microsoft VS Code\\bin", "USERDOMAIN_ROAMINGPROFILE=DESKTOP-BTMG2IL", "ZES_ENABLE_SYSMAN=1", "LOCALAPPDATA=C:\\Users\\Gerhard\\AppData\\Local", "HOMEPATH=\\Users\\Gerhard", "PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 158 Stepping 9, GenuineIntel", "NUMBER_OF_PROCESSORS=8", "PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC", "CYGWIN=nodosfilewarning" … "USERPROFILE=C:\\Users\\Gerhard", "DRIVERDATA=C:\\Windows\\System32\\Drivers\\DriverData", "ANDROID_SDK_HOME=C:\\Android", "PROCESSOR_LEVEL=6", "SYSTEMDRIVE=C:", "PROGRAMW6432=C:\\Program Files", "TEMP=C:\\Users\\Gerhard\\AppData\\Local\\Temp", "HOMEDRIVE=C:", "OPENBLAS_MAIN_FREE=1", "PROCESSOR_ARCHITECTURE=AMD64"]), ProcessExited(4294967295)) [4294967295]
Stacktrace:
[1] pipeline_error
@ .\process.jl:565 [inlined]
[2] run(::Cmd; wait::Bool)
@ Base .\process.jl:480
[3] run
@ .\process.jl:477 [inlined]
[4] run_compute_sanitizer(julia_args::Cmd; tool::String, sanitizer_args::Cmd)
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\utilities.jl:200
[5] run_compute_sanitizer
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\utilities.jl:196 [inlined]
[6] run_compute_sanitizer()
@ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\utilities.jl:196
[7] top-level scope
@ REPL[3]:1
[8] top-level scope
@ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\initialization.jl:208 |
Does just using 12.2 work, i.e., not running under compute-sanitizer? |
no, just using 12.2 also throws an error |
Hello, I have the same issue, Tracker works fine but Zygote throw I'm running a fresh instalation of julia (juliaup +release ) with the following CUDA : CUDA libraries:
Julia packages:
Toolchain:
1 device: and the following versions for packages: Also, here is the failing when I run @device_code_llvm dump_module=true grads = Zygote.gradient(f,x), I also have a lot of llvm code if you think it's needed I will post it fail: ; preds = %L5.lr.ph.L5.lr.ph.split_crit_edge Also, the weirdest thing, this works : x = rand(1,1000) |> cu StatsBase.Statistics._mean(identity,x) gradient(x-> StatsBase.Statistics._mean(identity,x),x) mymean(x::AbstractArray;dims=:) = StatsBase.Statistics._mean(identity,x,dims) gradient(mymean,x) mymean is like exactly the same as the StatsBase.Statistics.mean yet not throwing while gradient calculation. Finally, it's working with, ⌅ [052768ef] CUDA v4.4.1 If you need anything else don't hesitate, thanks you all. |
Great, I think that does narrow this down quite a bit! Can someone test the following Zygote-less example? You'll need to install ChainRulesCore and ChainRules into your test environment. using CUDA, ChainRulesCore, ChainRules
x = CUDA.rand(Float32,2,1000)
_, back = rrule(mean, x)
_, dx = back(1.0f0)
unthunk(dx) |
That works without error and gives me
|
I missed that the pullback returned a thunk. Can you try the edited version above? |
If I copy-paste that into a newly-opened REPL, the code works and returns a result. But as soon as I do a CUDA operation afterwards it crashes with (eventually) the 700 error:
|
Same for me with it's throwing at
working fine with
I've also tried with different versions of ChainRules and ChainRulesCore, it's not working as long as I keep CUDA v5.1.0 or 5.1.1. |
Ah interesting, seeing the pin/register in there makes me suspect this is due to unified memory behaving differently on Windows. It's likely caused by JuliaGPU/CUDA.jl#2109; I'll investigate further. |
Looks like broadcast uses |
Working on master, thanks a lot
|
Yep, master works for me too now - awesome! |
Wonderful. @dorn-gerhard can you confirm this works for you too? Then I'll close out the issue. |
@ToucheSir yes the problem seems to be fixed with current CUDA#master branch. |
I'm still getting the exact same error in Flux 0.14.8 but on Ubuntu 20.04, not Windows. I think the lines of interest in the stacktrace might be
And the line ~/mydir/MyProject/src/mycomponent/mycode.jl:39 is pointed out below:
I can't post the full code but I might try to get an MWE later. |
Are you using CUDA.jl#master? |
I'm sorry, I was assuming this had already been merged on CUDA.jl since then. Let me try it out and report back. |
It's been merged on CUDA.jl, but not tagged yet. I hope to release a new version this week. |
This is probably not the right place to post this, but I can't manage to install the version on the master branch in CUDA.jl because it has incompatible requirements with Flux:
|
All right. For now, I will try to roll back to 5.0.0 on CUDA.jl and when both these changes are released I'll test them again. |
I just tagged CUDA.jl v5.1.2, which should include the fix without requiring the latest Adapt.jl. |
Thanks Tim! |
Just reporting back. Now on julia v1.10.1 with CUDA.jl v5.2.0 and cuDNN.jl v1.3.0 our model is training. Thanks! |
Minimal working example:
The same error happens when using
Flux.mse(x, y)
, instead ofmean(x .- y)
, as the loss function.The error goes away by replacing
mean()
bysum()
, or by running on the CPU.Running latest Julia and packages:
Julia 1.9.4
Statistics v1.9.0
CUDA v5.1.1
Zygote v0.6.67
The first ~20% of the stack trace:
The text was updated successfully, but these errors were encountered: