Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ptx assembly aborted due to errors #58491

Closed
carlosgalvezp opened this issue Oct 20, 2022 · 32 comments · Fixed by #113216
Closed

Ptx assembly aborted due to errors #58491

carlosgalvezp opened this issue Oct 20, 2022 · 32 comments · Fixed by #113216

Comments

@carlosgalvezp
Copy link
Contributor

Hi!

We are bumping Clang to commit 1ae33bf, and we find that it crashes building CUDA code with this error trace:

ptxas /tmp/patch-4eaef1/patch-sm_61.s, line 3885; fatal   : Parsing error near '.': syntax error
ptxas fatal   : Ptx assembly aborted due to errors
clang: �[0;1;31merror: ptxas command failed with exit code 255 (use -v to see invocation)
clang version 16.0.0 (https://github.com/llvm/llvm-project.git 1ae33bf42680b156fe0f5cd6163bf24ef45d8cd3)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: external/llvm/usr/bin

Is this a known problem?

@carlosgalvezp
Copy link
Contributor Author

carlosgalvezp commented Oct 20, 2022

It has to do with the __func__ function, and only when compiling in debug mode -g. Minimal repro with CUDA 11.7 Clang trunk:

#include <cstdio>

__global__ void foo_kernel()
{
    printf("%s", __func__);
}

void foo()
{
    foo_kernel<<<10, 1>>>();
}
clang --cuda-path=/path/to/cuda-11.7 -c  -g --cuda-gpu-arch=sm_75 -o foo.cu.o foo.cu 

@carlosgalvezp
Copy link
Contributor Author

carlosgalvezp commented Oct 20, 2022

Bisecting brings me here: 7aa1fa0
FYI @hctim @dwblaikie @rnk @adrian-prantl

@Artem-B
Copy link
Member

Artem-B commented Oct 20, 2022

https://godbolt.org/z/8bMYcf1z7

The debug info directive that ptxas does not like is on line 655:

.b64 __func__._Z10foo_kernelv

It should've been __func___$__Z10foo_kernelv. Apparently NVPTX's name normalizer didn't get applied to the symbol name in debug info.

A work-around would be to disable GPU-side debug info with -Xarch_device -g0

@Artem-B Artem-B self-assigned this Oct 21, 2022
@carlosgalvezp
Copy link
Contributor Author

Thanks for the quick help! Will try the workaround :)

@vangohao
Copy link

vangohao commented Jan 3, 2024

https://godbolt.org/z/8bMYcf1z7

The debug info directive that ptxas does not like is on line 655:

.b64 __func__._Z10foo_kernelv

It should've been __func___$__Z10foo_kernelv. Apparently NVPTX's name normalizer didn't get applied to the symbol name in debug info.

A work-around would be to disable GPU-side debug info with -Xarch_device -g0

Is this issue solved? I am encountering this issue with clang and llvm 17.0.6

@tuero
Copy link

tuero commented Mar 6, 2024

I'm encountering a similar issue with clang version 18.0.0git (https://github.com/llvm/llvm-project.git b7376c319630a6b8395f3df5a46ba73e8fe29ea9), where debug builds fail when using __PRETTY_FUNCTION__

@tambry
Copy link
Contributor

tambry commented Mar 15, 2024

Minimal repro:

echo '__attribute__((device)) void foo(){__PRETTY_FUNCTION__;}' | clang -cc1 -triple nvptx64-nvidia-cuda -S -fcuda-is-device -debug-info-kind=constructor -fno-dwarf-directory-asm -Wno-everything -x cuda | ptxas -

Removing -debug-info-kind=constructor works around this.

@Artem-B
Copy link
Member

Artem-B commented Mar 15, 2024

It looks like another case of LLVM generating symbol names with a dot in it and sneaking through our attempts to normalize such names:

 .global .align 1 .b8 __PRETTY_FUNCTION___$__Z3foov[11] = {118, 111, 105, 100, 32, 102, 111, 111, 40, 41};
...

.b64 __PRETTY_FUNCTION__._Z3foov

The variable itself does have . mangled, but the reference from debug info does not.

Switching to line-only debug info would work around the issue, too.

@tambry
Copy link
Contributor

tambry commented Mar 19, 2024

Looked into this quite a bit. It seems the name gets embedded in a debug DIE during the annotation-remarks pass in getOrCreateGlobalVariableDIE()addLocationAttribute()addOpAddress(). Somehow there end up being 2 MCSymbols related to __PRETTY_FUNCTION__._Z3foov and nvptx-assign-valid-global-names renames the general one, but not the one that was embedded into the DIE...

After spending already too much time looking into this and not understanding enough about the guts of the LLVM debug information infrastructure I took the easy way out:

Generate pre-defined lvalue names without dots

`.` should be converted to `_$_` by the nvptx-assign-valid-global-names pass as `ptxas` doesn't support dots.
But during the ASMPrinter initialization the global variable name gets embedded in a debug DIE.
There somehow end up being two different `MCSymbol`s for the global variable with only the main one being renamed.

Bug: https://github.com/llvm/llvm-project/issues/58491
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -3277,7 +3277,12 @@ LValue CodeGenFunction::EmitPredefinedLV
     FnName = FnName.substr(1);
   StringRef NameItems[] = {
       PredefinedExpr::getIdentKindName(E->getIdentKind()), FnName};
-  std::string GVName = llvm::join(NameItems, NameItems + 2, ".");
+  std::string GVName;
+  if (CGM.getLangOpts().CUDA && CGM.getLangOpts().CUDAIsDevice) {
+    GVName = llvm::join(NameItems, NameItems + 2, "_$_");
+  } else {
+    GVName = llvm::join(NameItems, NameItems + 2, ".");
+  }
   if (auto *BD = dyn_cast_or_null<BlockDecl>(CurCodeDecl)) {
     std::string Name = std::string(SL->getString());
     if (!Name.empty()) {

@Artem-B
Copy link
Member

Artem-B commented Mar 19, 2024

I think we've dealt with a similar issue in the dwarf debug info before. Let me see if I can find it.

@Artem-B
Copy link
Member

Artem-B commented Mar 20, 2024

I think I had 2e7e097 in mind, but it may not be helpful here as it was dealing with the concept of private prefixes. Here the symbol which causes the problem is a . used as a separator.

I believe we did discuss invalid symbol issues in the past, but I do not think it ever went anywhere.
E.g. the discussion on https://reviews.llvm.org/D40573 still seems to be somewhat relevant.
Especially this bit:

This is silly. This bug has been open for so long that nvidia could've just fixed their toolchain by now to accept dots in symbol names.

Back to figuring out how to fix this instance.

But during the ASMPrinter initialization the global variable name gets embedded in a debug DIE.
There somehow end up being two different MCSymbols for the global variable with only the main one being renamed.

Oh, well. Looks like we may need to do it the hard way and teach nvptx-assign-valid-global-names how to deal with the symbols in debug info. It would still be dealing with the consequences, but at least the mess would be contained in one place.

@alexey-bataev Would you happen to have any idea on what would be the best way to get DWARF's symbol references mangled the same way we mangle other symbols in NVPTX?

@alexey-bataev
Copy link
Member

I always thought that we need to handle it in the frontend. But it is only my thought, feel free to discard it.

@Artem-B
Copy link
Member

Artem-B commented Mar 20, 2024

Avoiding such symbols in the front-end is would avoid some of the issues (granted, including this one), but a symbol with a dot may materialize within LLVM itself. Granted, it may not happen often in practice. It's also possible that such symbol cloning would not be affected by this issue (e.g. if, unlike this case, debug info would point to the same MCSymbol for the cleaned up name).

Here are the options I see:

  • Get NVIDIA to change ptxas and allow a more sensible set of characters in identifiers. The problem is that it's not going to help us for a long time, as we need to deal with ptxas versions that are out there already.
  • Change LLVM to use something other than . when it needs to create identifiers. This has consequences for ABI. E.g. host/device symbols will get mangled differently. That would be a problem.
  • Because of the above, this name cleanup may need to be applied selectively on multiple targets (NVPTX + supported host architectures, currently x86 and ARM). E.g. we'll want to apply it to all symbols on the GPU, and to all symbols that need to have the same name across host/GPU boundary. E.g. kernels and other GPU-side symbols we may need to refer to from the host.
  • cleanup the names in the front-end. This is a very narrow workaround for a subset of these 'illegal character' issues. Should be enough to deal with this case, but I do not like it because it's not front-end's job to know about the quirks of something many abstraction levels below it. Front-end should be contrained by the contract between it and LLVM. If the symbol is valid for LLVM, how it gets lowered into target assembly is LLVM's responsibility.
  • Teach nvptx-assign-valid-global-names how to fix symbol names in associated debug info. I think that may be the least bad trade-off we may have at the moment. The caveat is that I have no idea how much effort it would take.

@dwblaikie If we rename a global symbol how hard is that to find and update references to the symbol from debug info. I suspect we already do that somewhere in LLVM. Can you point me in the right direction?

@dwblaikie
Copy link
Collaborator

Not sure if existing instances of this (as you say, abi would mostly make it impossible to change symbol names effectively)

But if you want to try it - the disubprogram attached to the function, if it has the mangled name (maybe it doesn't, maybe it just depends on the actual symbol name of the llvm::function in which case you wouldn't have to do anything for debuginfo) - that should be updated.

@tambry
Copy link
Contributor

tambry commented Mar 21, 2024

The DISubprogram name referred to the correct MCSymbol* and was correct AFAIK. The problematic name instead seemed to be attached to the !17 debug annotation on the ret.

source_filename = "-"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

@"__PRETTY_FUNCTION___$__Z3foov" = private unnamed_addr constant [11 x i8] c"void foo()\00", align 1, !dbg !0

; Function Attrs: convergent mustprogress noinline nounwind optnone
define dso_local void @_Z3foov() #0 !dbg !14 {
entry:
  ret void, !dbg !17
}

attributes #0 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+ptx32" }

!llvm.dbg.cu = !{!8}
!llvm.linker.options = !{}
!llvm.module.flags = !{!10, !11, !12}
!llvm.ident = !{!13}

!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(scope: null, file: !2, line: 1, type: !3, isLocal: true, isDefinition: true)
!2 = !DIFile(filename: "<stdin>", directory: "/home/raul.tambre")
!3 = !DICompositeType(tag: DW_TAG_array_type, baseType: !4, size: 88, elements: !6)
!4 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !5)
!5 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
!6 = !{!7}
!7 = !DISubrange(count: 11)
!8 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !2, producer: "Clebian clang version 19.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, globals: !9, splitDebugInlining: false, nameTableKind: None)
!9 = !{!0}
!10 = !{i32 2, !"Debug Info Version", i32 3}
!11 = !{i32 1, !"wchar_size", i32 4}
!12 = !{i32 4, !"nvvm-reflect-ftz", i32 0}
!13 = !{!"Clebian clang version 19.0.0"}
!14 = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov", scope: !2, file: !2, line: 1, type: !15, scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !8)
!15 = !DISubroutineType(types: !16)
!16 = !{null}
!17 = !DILocation(line: 1, column: 56, scope: !14)

I managed to write something to reach that instruction, but not how to reach the DIE that had the wrong name embedded already. Seems to be an abstraction layer away and inaccessible in such a pass. It seemed having nvptx-assign-valid-global-names run as one the first passes before the DIE is created might work.

@dwblaikie
Copy link
Collaborator

Sorry, I'm not following that last comment - the DISubprogram is the same one from the Function and from the DILocation.

I take it this renaming isn't done at the IR level, OK - so it's not about updating the DISubprogram itself to match a change to the Function, but later than that.

Sure enough then - DwarfUnit::applySubprogramDefinitionAttributes calls addLinkageName - I guess it'd need some awkward mapping in DwarfDebug of DISubprogram back to llvm::Function... I don't feel good about that, maybe there's some other way to handle it, but you could at least prototype that.

@tambry
Copy link
Contributor

tambry commented Mar 21, 2024

Sorry, I'm not following that last comment - the DISubprogram is the same one from the Function and from the DILocation.

I guess was aiming at that you can't get the DILocation from the DISubprogram, but rather have to iterate the instructions to find the return instruction with the appropriate debug annotation. At least it seemed to me so, but chewing through the API and abstractions was difficult when I did try. 🙂

@dwblaikie
Copy link
Collaborator

Sorry, I'm not following that last comment - the DISubprogram is the same one from the Function and from the DILocation.

I guess was aiming at that you can't get the DILocation from the DISubprogram, but rather have to iterate the instructions to find the return instruction with the appropriate debug annotation. At least it seemed to me so, but chewing through the API and abstractions was difficult when I did try. 🙂

Ah, yes, DILocations aren't accesible top-down from the DISubprogram, only bottom-up from the DISubprogram's Function's instructions.

@aminiussi
Copy link

aminiussi commented Jul 10, 2024

Hello, I have a similar issue with llvm 18.1.8 and CUDA 12.5. Is that expected ?

The failing line is

.b64 __PRETTY_FUNCTION__._ZN7fargOCA6nsnextILm1EEEmmm

and the message:

alainm@jarvis:/scrach/alainm/repos/fargOCA/gpu/seq/llvm/dbg$ptxas --version
ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:14:54_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
alainm@jarvis:/scrach/alainm/repos/fargOCA/gpu/seq/llvm/dbg$ptxas -v  ./disk-cuda-nvptx64-nvidia-cuda-sm_89.s
ptxas ./disk-cuda-nvptx64-nvidia-cuda-sm_89.s, line 148598; fatal   : Parsing error near '.': syntax error
ptxas fatal   : Ptx assembly aborted due to errors
alainm@jarvis:/scrach/alainm/repos/fargOCA/gpu/seq/llvm/dbg$

It is also failing with main branch:

alainm@jarvis:/scrach/alainm/repos/fargOCA/gpu/seq/llvm/dbg$/opt/llvm-main/bin/clang++ --version
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 08ce14732d528ab70309f334446d39782f2f07c0)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm-main/bin
alainm@jarvis:/scrach/alainm/repos/fargOCA/gpu/seq/llvm/dbg$$ptxas disk-cuda-nvptx64-nvidia-cuda-sm_89.s
ptxas disk-cuda-nvptx64-nvidia-cuda-sm_89.s, line 145450; fatal   : Parsing error near '.': syntax error
ptxas fatal   : Ptx assembly aborted due to errors
alainm@jarvis:/scrach/alainm/repos/fargOCA/gpu/seq/llvm/dbg$

@Artem-B
Copy link
Member

Artem-B commented Jul 10, 2024

Unfortunately, the issue is still there, and we still do not have a good fix. Disabling GPU-side dwarf debug info with -Xarch_device -g0 is the best workaround we have at the moment.

@tambry
Copy link
Contributor

tambry commented Jul 11, 2024

@Artem-B Would you object to applying this workaround to master until someone actually dives into the guts of the DI subsystem to find the bug?

@Artem-B
Copy link
Member

Artem-B commented Jul 11, 2024

Doing it on clang side would depend on the name mangling implementation details in NVPTX back-end. I think a better approach would be to try intercepting printouts of .bN constants and mangle the ones that look like a symbol. While it's not the best time/place to do that mangling (we do need to figure out how to fix it wherever the unmangled name lives, and fix it there), but it would be doing the right thing in principle, using the same mangling we're applying to the symbols produced by NVPTX.

@aminiussi
Copy link

@Artem-B Would you object to applying this workaround to master until someone actually dives into the guts of the DI subsystem to find the bug?

Fine by me. Thanks

@sidarth-narayanan-csi
Copy link

Hello,

We are facing the same issue, was the workaround fix already merged into the master ? If yes, what is the version number for reference ?

Thank you

@Artem-B
Copy link
Member

Artem-B commented Aug 6, 2024

This problem is not fixed yet. You may work around by disabling GPU-side debug info with -Xarch_device -g0

@sidarth-narayanan-csi
Copy link

@Artem-B Thank you for the response

Artem-B added a commit to Artem-B/llvm-project that referenced this issue Oct 21, 2024
Until now debug info was printing the symbols names as-is and that resulted
in invalid PTX when the symbols contained characters that are incalid for PTX.
E.g. `__PRETTY_FUNCTION.something`

Debug info is somewhat disconnected from the symbols themselves, so the regular
"NVPTXAssignValidGlobalNames" pass can't easily fix them.

As the "plan B" this patch catches printout of debug symbols and fixes them, as needed.
One gotcha is that the same code path is used to print the names of debug info sections.
Those section names do start with a '.debug'. The dot in those names is nominally illegal
in PTX, but the debug section names with a dot are accepted as a special case.
The downside of this change is that if someone ever has a `.debug*` symbol that needs to
be referred to from the debug info, that label will be passed through as-is, and will
still produce broken PTX output. If/when we run into a case where we need it to work, we
could consider only passing through specific debug section names, or add a mechanist allowing
us to tell section names apart from regular symbols.

Fixes llvm#58491
Artem-B added a commit to Artem-B/llvm-project that referenced this issue Oct 21, 2024
Until now debug info was printing the symbols names as-is and that resulted
in invalid PTX when the symbols contained characters that are incalid for PTX.
E.g. `__PRETTY_FUNCTION.something`

Debug info is somewhat disconnected from the symbols themselves, so the regular
"NVPTXAssignValidGlobalNames" pass can't easily fix them.

As the "plan B" this patch catches printout of debug symbols and fixes them, as needed.
One gotcha is that the same code path is used to print the names of debug info sections.
Those section names do start with a '.debug'. The dot in those names is nominally illegal
in PTX, but the debug section names with a dot are accepted as a special case.
The downside of this change is that if someone ever has a `.debug*` symbol that needs to
be referred to from the debug info, that label will be passed through as-is, and will
still produce broken PTX output. If/when we run into a case where we need it to work, we
could consider only passing through specific debug section names, or add a mechanist allowing
us to tell section names apart from regular symbols.

Fixes llvm#58491
Artem-B added a commit that referenced this issue Oct 22, 2024
…113216)

Until now debug info was printing the symbols names as-is and that
resulted in invalid PTX when the symbols contained characters that are
invalid for PTX. E.g. `__PRETTY_FUNCTION.something`

Debug info is somewhat disconnected from the symbols themselves, so the
regular "NVPTXAssignValidGlobalNames" pass can't easily fix them.

As the "plan B" this patch catches printout of debug symbols and fixes
them, as needed. One gotcha is that the same code path is used to print
the names of debug info sections. Those section names do start with a
'.debug'. The dot in those names is nominally illegal in PTX, but the
debug section names with a dot are accepted as a special case. The
downside of this change is that if someone ever has a `.debug*` symbol
that needs to be referred to from the debug info, that label will be
passed through as-is, and will still produce broken PTX output. If/when
we run into a case where we need it to work, we could consider only
passing through specific debug section names, or add a mechanism
allowing us to tell section names apart from regular symbols.

Fixes #58491
@lahwaacz
Copy link

I'm now getting this warning with clang 18.1.8:

clang++: warning: argument unused during compilation: '-Xarch_device -g0' [-Wunused-command-line-argument]

But the fix from #113216 came much later than this release 🤷

@Artem-B
Copy link
Member

Artem-B commented Dec 2, 2024

I'm now getting this warning with clang 18.1.8:

clang++: warning: argument unused during compilation: '-Xarch_device -g0' [-Wunused-command-line-argument]

It's hard to tell why you get this warning without the complete command line. Are you by any chance passing it to a plain C++ compilation? If so, the warning would be expected, as the -Xarch_device -g0 is applicable only to the GPU-side CUDA compilation.

@lahwaacz
Copy link

lahwaacz commented Dec 3, 2024

@Artem-B I'm passing it to CMAKE_CUDA_FLAGS_DEBUG in CMake:

if(CMAKE_CUDA_COMPILER_ID STREQUAL "Clang")
    set(CMAKE_CUDA_FLAGS_DEBUG "-g -Xarch_device -g0")
endif()

@Artem-B
Copy link
Member

Artem-B commented Dec 3, 2024

What is the complete compiler command line with all the options, that produces the warning?

The warning is likely benign, but it points that a GPU-specific option has been passed to a compiler invocation that does not do any GPU-side compilations. It's easy enough to silence, with the downside of potentially silencing other unexpectedly ignored options if/when you run into them.

The right way to handle it is to make sure that Xarch_device -g0 is passed only to the CUDA source code compilations, and does not get applied to compiler invocations that do C++ compilation or linking.

@lahwaacz
Copy link

lahwaacz commented Dec 4, 2024

What is the complete compiler command line with all the options, that produces the warning?

Should be this one:

/usr/sbin/clang++ -stdlib=libc++ -Wall -g -Xarch_device -g0 --cuda-path=/opt/cuda-11.8 -fuse-ld=lld -lc++ -lc++abi    -Xlinker --dependency-file -Xlinker src/Benchmarks/HeatEquation/CMakeFiles/tnl-benchmark-heat-equation.dir/link.d src/Benchmarks/HeatEquation/CMakeFiles/tnl-benchmark-heat-equation.dir/tnl-benchmark-heat-equation.cu.o CMakeFiles/tnl-benchmark-heat-equation.dir/cmake_device_link.o -o bin/tnl-benchmark-heat-equation-dbg  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/opt/cuda-11.8/lib64"

The right way to handle it is to make sure that Xarch_device -g0 is passed only to the CUDA source code compilations, and does not get applied to compiler invocations that do C++ compilation or linking.

Any idea how to do it in CMake?

@Artem-B
Copy link
Member

Artem-B commented Dec 4, 2024

Ignoring the warning, or adding -Wno-unused-command-line-argument to silence it would be the easiest option. It's expected that GPU options will be ignored during linking.

I'm not particularly familiar with cmake's CUDA-related plumbing, but you may look for the compilation-specific subset of flags. CMAKE_CUDA_FLAGS_DEBUG looks like it's supposed to apply to everything that invokes the compiler, and that includes linking phase.
Cursory look through cmake docs does not show anything useful. You could add linker-specific options via CMAKE_EXE_LINKER_FLAGS, but it seems that CMAKE_<LANG>_FLAGS you set do get propagated to the link phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.