-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA support broken since Enzyme v0.8 and GPUCompiler v13 #230
Comments
Can you tell me what |
|
So you got How the heck did that happen? Line 19 in bd08274
|
@vchuravy Don't know, just |
Releasing a fixed 0.8.5 and will then adjust the registry post-hoc. |
@vchuravy Alright, thanks, I'll rerun it once it's released. |
@vchuravy Didn't solve the problem.
|
Note that that is a vastly different error. (Also fixed the general issue in JuliaRegistries/General#54523) |
Can you try |
@vchuravy Sure, sorry, I referred to the differentiation itself. The original error reappers, I guess because
The error message :
|
Ah okay, that's why I opened JuliaRegistries/General#54523 to avoid that from happening. Sorry for the mess. I am setting up GPU CI right now to stop this from regressing again. If I had to guess at the version you need is something like |
@vchuravy I'm really keen on using Enzyme on parts of my differentiable solver, so thanks for the help here. Unfortunately, the following issue was raised this time :
|
It happens that Enzyme couldn't guess the activity of the return value in this case. After adding it to the For people who may be facing this issue, I'm replacing Enzyme.autodiff_deferred(f!, ds, da, db) by Enzyme.autodiff_deferred(f!, Const, ds, da, db) Am using Here's an implementation (as a module) : module cukernel
using Test
using Enzyme
using CUDA
if has_cuda()
@info "CUDA is on"
CUDA.allowscalar(false)
end
export add
## CPU kernel summing two vectors `a` and `b`
## and storing results in vector `s`
function f_cpu!(s, a, b)
s .= a .+ b
return nothing
end # f_cpu!
## Wrapper for Enzyme call
## to differentiate `f_cpu!` on the CPU
function df_cpu!(ds, da, db)
Enzyme.autodiff(f_cpu!, Const, ds, da, db)
return nothing
end # df!
## CUDA kernel summing two vectors `a` and `b`
## and storing results in vector `s`
function f!(s, a, b)
i = threadIdx().x
s[i] = a[i] + b[i]
return nothing
end # f!
## Wrapper for Enzyme call
## to differentiate `f!` on the GPU
function df!(ds, da, db)
Enzyme.autodiff_deferred(f!, Const, ds, da, db)
return nothing
end # df!
## Perform sum of two vectors
## and compute gradients of the operation
## on the GPU
function add()
## Instantiate vectors `a` and `b` to be summed
## and vector `s` where result is stored
nthreads = 4
a_cpu = rand(nthreads)
b_cpu = rand(nthreads)
s_cpu = zero(a_cpu)
a = cu(a_cpu)
b = cu(b_cpu)
s = cu(s_cpu)
## Call CUDA kernel `f!`
@cuda threads=nthreads f!(s, a, b)
@info "vector `a`"
a |> display
@info "vector `b`"
b |> display
"" |> println
@info "vector `s := a + b`"
s |> display
## Call `df!` to compute gradients
## on the GPU via Enzyme
dz_ds_cpu = rand(nthreads) # Some gradient passed to us from other functions
dz_da_cpu = zero(a_cpu)
dz_db_cpu = zero(b_cpu)
dz_ds = cu(dz_ds_cpu)
dz_da = cu(dz_da_cpu)
dz_db = cu(dz_db_cpu)
ds = Duplicated(s, dz_ds)
da = Duplicated(a, dz_da)
db = Duplicated(b, dz_db)
@cuda threads=nthreads df!(ds, da, db)
## Check results against CPU
ds_cpu = Duplicated(s_cpu, dz_ds_cpu)
da_cpu = Duplicated(a_cpu, dz_da_cpu)
db_cpu = Duplicated(b_cpu, dz_db_cpu)
df_cpu!(ds_cpu, da_cpu, db_cpu)
@test dz_da ≈ cu(dz_da_cpu)
@test dz_db ≈ cu(dz_db_cpu)
end # add
end # module I'm closing this issue. Thanks @vchuravy ! |
@vchuravy I presume at this point we can close this? |
Hi, thanks again for the work on the CUDA support, the above tests ran fine ! There's just a small issue when using using CUDA
using Enzyme
if has_cuda()
@info "CUDA is on"
CUDA.allowscalar(false)
end
function kernel!(u, ::Val{n}) where {n}
return nothing
end # kernel!
function dkernel!(du, ::Val{n}) where {n}
Enzyme.autodiff_deferred(kernel!, Const, du, Val(n))
return nothing
end # dkernel!
function call_dkernel()
n = 10
u = rand(n) |> cu
dzdu = rand(n) |> cu
du = Duplicated(u, dzdu)
@cuda threads=4 dkernel!(du, Val(n))
end # call_dkernel
call_dkernel() The output : [ Info: CUDA is on
ERROR: LoadError: InvalidIRError: compiling kernel #dkernel!(Duplicated{CuDeviceVector{Float32, 1}}, Val{10}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to jl_f_getfield)
Stacktrace:
[1] getindex
@ ./tuple.jl:29
[2] iterate
@ ./tuple.jl:69
[3] same_or_one
@ /scratch/drozda/.julia/packages/Enzyme/7MHm8/src/Enzyme.jl:203
[4] autodiff_deferred
@ /scratch/drozda/.julia/packages/Enzyme/7MHm8/src/Enzyme.jl:429
[5] dkernel!
@ /scratch/drozda/test.jl:16
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(dkernel!), Tuple{Duplicated{CuDeviceVector{Float32, 1}}, Val{10}}}}, args::LLVM.Module)
@ GPUCompiler /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/validation.jl:139
[2] macro expansion
@ /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/driver.jl:409 [inlined]
[3] macro expansion
@ /scratch/drozda/.julia/packages/TimerOutputs/LDL7n/src/TimerOutput.jl:252 [inlined]
[4] macro expansion
@ /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/driver.jl:407 [inlined]
[5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/utils.jl:64
[6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
@ CUDA /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:354
[7] #224
@ /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:347 [inlined]
[8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(dkernel!), Tuple{Duplicated{CuDeviceVector{Float32, 1}}, Val{10}}}}})
@ GPUCompiler /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/driver.jl:74
[9] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:346
[10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/cache.jl:90
[11] cufunction(f::typeof(dkernel!), tt::Type{Tuple{Duplicated{CuDeviceVector{Float32, 1}}, Val{10}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:299
[12] cufunction(f::typeof(dkernel!), tt::Type{Tuple{Duplicated{CuDeviceVector{Float32, 1}}, Val{10}}})
@ CUDA /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:293
[13] macro expansion
@ /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:102 [inlined]
[14] call_dkernel()
@ Main /scratch/drozda/test.jl:27
[15] top-level scope
@ /scratch/drozda/test.jl:31
[16] include(fname::String)
@ Base.MainInclude ./client.jl:451
[17] top-level scope
@ REPL[2]:1
[18] top-level scope
@ /scratch/drozda/.julia/packages/CUDA/GGwVa/src/initialization.jl:52
in expression starting at /scratch/drozda/test.jl:31 |
Can you open a new issue? |
Done here, thanks ! |
MWE
Returns
The text was updated successfully, but these errors were encountered: