building with triton support? #166

ngam · 2023-04-05T00:35:47Z

I am a bit confused what is depending on what. As far as I can see, pytorch depends on torchtriton, which in turn seems to depend on pytorch.
Is there any difference between torchtriton and triton?
We already package an old version of triton in conda-forge and have a PR open for version 2.0.0 (triton v2.0.0 triton-feedstock#2). Would this 2.0.0 version be suitable as dependency for here?

Originally posted by @Tobias-Fischer in #165 (comment)

--

More background: #151

ngam · 2023-04-09T01:45:40Z

@Tobias-Fischer Are you aware of any quick example demonstrating the usage of triton? No worries if not, I will look upstream. I am now back to home base and can help debug this further

Tobias-Fischer · 2023-04-09T02:53:32Z

The first code snippet is an easy example: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html

ngam · 2023-04-09T03:12:39Z

Alright, it fails with InvalidCxxCompiler

InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

The above exception was the direct cause of the following exception:

BackendCompilerFailed                     Traceback (most recent call last)

...
...
...

BackendCompilerFailed: debug_wrapper raised InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

ngam · 2023-04-09T04:48:40Z

@h-vetinari and @hmaarrfk, just fyi. My current assessment is that we will likely need to wait for triton 2.x, then simply add it as run dep and see if things work out. I tried adding torchtriton (from the pytorch channel) and it didn't work because it was searching for system c libraries not linked correctly. It seems that all components are in place, and we just need to have the triton component, but I may be wrong

Tobias-Fischer · 2023-05-01T04:03:41Z

I just confirmed that this example works: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html
when using conda-forge/triton-feedstock#3 and installing the cxx_compiler :)

Tobias-Fischer · 2023-05-01T04:05:30Z

Ah, something surprising (?): It also works without installing triton, just by having a cxx_compiler installed (e.g. mamba install compilers)

Tobias-Fischer · 2023-05-06T01:06:29Z

Made some progress - see conda-forge/triton-feedstock#6

The open question in my mind is still the circular dependency from both torch to triton and from triton to torch. Not sure how to best deal with it .. a run_constrained maybe?

h-vetinari · 2023-05-06T02:04:57Z

Not sure how to best deal with it .. a run_constrained maybe?

Probably with a -base package that's built here, depended on by triton to build itself, and then we can include triton here as a dependence of the complete pytorch package.

benjaminrwilson · 2023-05-17T02:58:18Z

I keep running into:

cannot find -lcuda: No such file or directory

when using torch.compile. I've installed both cudatoolkit and cudatoolkit-dev.

Any idea what might be going on?

Tobias-Fischer · 2023-05-17T03:00:49Z

We need conda-forge/triton-feedstock#6

RaulPPelaez · 2023-06-16T17:01:08Z

Bumping this since conda-forge/triton-feedstock#6 was merged. Thanks for the good work!

ngam · 2023-06-16T17:03:32Z

Can we quickly test this? Just use the package we have and install triton. Do things magically work or do we need to more tweaks in this feedstock?

RaulPPelaez · 2023-06-16T17:11:05Z

Anecdotal experience, but pytorch and triton from conda-forge seem to pick up each other fine today (as in no segfaults when calling torch.compile with the inductor backend). I believe special care should be taken with dependency versions. Both triton and support for it in torch are really experimental and probably have quite narrow ranges of versions where they are supposed to work together.
On a side note, what we would really like to see is pytorch 2.0.1 (available in the pytorch channel) in conda-forge.

ngam · 2023-06-16T19:20:49Z

@RaulPPelaez good idea on 2.0.1. Would you be interested in submitting a PR? I could start one now...

ngam · 2023-06-16T19:22:27Z

see #172 we could also include formal triton support in that one too

cread · 2024-04-29T14:55:04Z

Has anyone looked at this again recently?

danpetry · 2024-10-25T21:13:36Z

Here's what we've done at Anaconda for v2.3.0: https://github.com/AnacondaRecipes/triton-feedstock

There are some notes in the meta.yaml about various choices we made. Let me know if you've got any questions.

mgorny · 2024-11-20T14:12:26Z

I'm going to try making a new pull request for 3.1.0, as that's the version required by PyTorch 2.5.1.

@danpetry, thanks. Curious enough, I've just tried diffing the PyPI 3.1.0 package against the one provided by PyTorch, and — at least as far as .py files go — they seem the same. So I don't think we technically need a rename here.

danpetry · 2024-11-20T16:29:03Z

I think pytorch call(ed) their conda package torchtriton too?
The problem is that triton vendors-in (last time I checked) a random commit of llvm that might make it not widely compatible with other packages. I.e. usable outside pytorch. Hence naming it with "torch" in the name.

danpetry · 2024-11-20T16:31:37Z

I probably need to re-check the logic of that statement, but that's the conclusion I came to when I worked on it, at least I wanted to be cautious and say, "this should not be used except with pytorch, with which it has been explicitly integration tested"

danpetry · 2024-11-20T16:33:06Z

It uses llvm at runtime, rather than build time

isuruf · 2024-11-20T16:33:26Z

The issue was that triton did not have wheels and even when it did it took time to get releases in. There was also an issue with rocm support not getting merged. All of these have been resolved I think.

danpetry · 2024-11-20T16:36:09Z

ok, so we can now use the wheels rather than the git repo to build? IIRC this wasn't possible

isuruf · 2024-11-20T16:42:10Z

In conda packaging? We don't want to use pre-compiled wheels in conda-build.

mgorny · 2024-11-20T16:46:15Z

Yeah, I've already noticed that it doesn't work neither with LLVM 18 nor 19 release. That said, I've assumed that it's applicable to all triton releases, whether coming from PyPI triton or PyTorch's builds. I'm going to try looking at triton's main later, to see whether I can find a commit that actually works with a LLVM release, and then let you know how "far" it is from 2.1.0. Perhaps it wouldn't be that hard to make it compatible with LLVM 19, or we could try packaging a newer snapshot that would also be compatible with PyTorch.

danpetry · 2024-11-20T16:54:29Z

I think my key issue was that they weren't tagging commits in the github repo, there weren't any sdists, and I didn't want to release a github commit and call it a release version

mgorny · 2024-11-20T16:57:16Z

I think my key issue was that they weren't tagging commits in the github repo, there weren't any sdists, and I didn't want to release a github commit and call it a release version

Yes, that is still the case. However, they do release versioned wheels, and I've confirmed that the contents of .py files match the commit from PyTorch — and I think that's as good confirmation as we can get that it's the commit used to make the release upstream.

isuruf · 2024-11-20T17:06:26Z

@atalman, since you seem to be a maintainer of triton on PyPI, could you let us know which commit you use for the wheel relases?

danpetry · 2024-11-20T17:12:32Z

It's here AFAIU, but NB that triton still don't have this tagged

danpetry · 2024-11-20T17:14:28Z

whether that's good enough re traceability for conda-forge, I don't know. for anaconda main, we decided it wasn't. Triton can force-push the branch it's on and remove the commit - and indeed have in the past. (I suppose they can change tags too. Ideal is a github release.)

isuruf · 2024-11-20T17:15:11Z

@danpetry that's for torch-triton. AFAIK @atalman manages triton on PyPI index in addition to torch-triton on pytorch's wheel index..

danpetry · 2024-11-20T17:19:52Z

I believe it's one and the same commit, co-ordinated by him, but he can confirm

mgorny · 2024-11-20T19:27:36Z

The commit fixing compatibility with LLVM 19 is triton-lang/triton@46550ab. I'm going to see how much would we actually need to patch to get it to work on top of 3.1.0.

mgorny · 2024-11-20T20:22:04Z

I've pushed my WIP to conda-forge/triton-feedstock#26.

danpetry · 2024-11-21T00:00:56Z

Worth bearing in mind that LLVM is only used by triton at runtime, to compile cuda kernels. And in the end, the binary format I guess is determined by the cuda compiler rather than llvm. So, keeping it vendored-in isn't an issue so far as compatibility with the rest of the distro is concerned, I think.

danpetry · 2024-11-21T00:03:45Z

I don't know if anyone else can confirm this?

mgorny · 2024-11-21T04:57:53Z

I think the bigger issue here is that triton either downloads a prebuilt LLVM version if it detects a supported platform, or uses system LLVM (expecting this specific commit) when it doesn't.

rgommers · 2024-11-25T12:27:31Z

Cc @amjames. Andrew, you had some useful insights into the PyTorch -> Triton -> LLVM coupling, so you may be interested in this topic and in conda-forge/triton-feedstock#26. Making Triton compatible with a proper LLVM release can be very useful for Conda-forge (and probably other distros as well). conda-forge/triton-feedstock#26 (comment) summarizes how this was achieved for Triton 3.1.0 with the LLVM 19 release. Some manual testing seems to confirm success at the "build and seems to compile stuff with nvcc" level - perhaps you have some suggestions into what subset of the PyTorch test suite to run to confirm that PyTorch + Triton works as designed?

danpetry · 2024-11-25T15:07:53Z

There's a smoke test which tests torch.compile with cuda (if an environment variable is appropriately set)

mgorny · 2024-11-25T15:23:53Z

Thanks. Looks like I was wrong and some patching is necessary for regular CC to be able to find CUDA headers:

Testing smoke_test_compile for cuda and torch.float16
/tmp/tmp_itxw3hv/main.c:1:10: fatal error: cuda.h: No such file or directory
    1 | #include "cuda.h"
      |          ^~~~~~~~
compilation terminated.
/tmp/tmpwwhx19wm/main.c:1:10: fatal error: cuda.h: No such file or directory
    1 | #include "cuda.h"
      |          ^~~~~~~~
compilation terminated.
Traceback (most recent call last):
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1446, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/__init__.py", line 2234, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1521, in compile_fx
    return aot_autograd(
           ^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 72, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1071, in aot_module_simplified
    compiled_fn = dispatch_and_compile()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1056, in dispatch_and_compile
    compiled_fn, _ = create_aot_dispatcher_function(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 522, in create_aot_dispatcher_function
    return _create_aot_dispatcher_function(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 759, in _create_aot_dispatcher_function
    compiled_fn, fw_metadata = compiler_fn(
                               ^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 179, in aot_dispatch_base
    compiled_fw = compiler(fw_module, updated_flat_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1350, in fw_compiler_base
    return _fw_compiler_base(model, example_inputs, is_inference)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1421, in _fw_compiler_base
    return inner_compile(
           ^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 475, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 85, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 661, in _compile_fx_inner
    compiled_graph = FxGraphCache.load(
                     ^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 1334, in load
    compiled_graph = compile_fx_fn(
                     ^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 570, in codegen_and_compile
    compiled_graph = fx_codegen_and_compile(gm, example_inputs, **fx_kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 878, in fx_codegen_and_compile
    compiled_fn = graph.compile_to_fn()
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1913, in compile_to_fn
    return self.compile_to_module().call
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1839, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1845, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
                                                             ^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/graph.py", line 1784, in codegen
    self.scheduler.codegen()
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/scheduler.py", line 3383, in codegen
    return self._codegen()
           ^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/scheduler.py", line 3461, in _codegen
    self.get_backend(device).codegen_node(node)
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 80, in codegen_node
    return self._triton_scheduling.codegen_node(node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/codegen/simd.py", line 1155, in codegen_node
    return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/codegen/simd.py", line 1364, in codegen_node_schedule
    src_code = kernel.codegen_kernel()
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/codegen/triton.py", line 2661, in codegen_kernel
    **self.inductor_meta_common(),
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_inductor/codegen/triton.py", line 2532, in inductor_meta_common
    "backend_hash": torch.utils._triton.triton_hash_with_backend(),
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/utils/_triton.py", line 53, in triton_hash_with_backend
    backend = triton_backend()
              ^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/utils/_triton.py", line 45, in triton_backend
    target = driver.active.get_current_target()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
                ^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/runtime/driver.py", line 9, in _create_driver
    return actives[0]()
           ^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
    self.utils = CudaUtils()  # TODO: make static
                 ^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
    so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/runtime/build.py", line 48, in _build
    ret = subprocess.check_call(cc_cmd)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/mgorny/.conda/envs/pytorch/bin/x86_64-conda-linux-gnu-cc', '/tmp/tmpwwhx19wm/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpwwhx19wm/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/backends/nvidia/lib', '-L/lib/x86_64-linux-gnu', '-L/lib32', '-I/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/backends/nvidia/include', '-I/tmp/tmpwwhx19wm', '-I/home/mgorny/.conda/envs/pytorch/include/python3.12']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mgorny/smoke_test.py", line 352, in <module>
    main()
  File "/home/mgorny/smoke_test.py", line 348, in main
    smoke_test_cuda(options.package, options.runtime_error_check, options.torch_compile_check)
  File "/home/mgorny/smoke_test.py", line 171, in smoke_test_cuda
    smoke_test_compile("cuda" if torch.cuda.is_available() else "cpu")
  File "/home/mgorny/smoke_test.py", line 261, in smoke_test_compile
    x_pt2 = torch.compile(foo)(x)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1269, in __call__
    return self._torchdynamo_orig_callable(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1064, in __call__
    result = self._inner_convert(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 526, in __call__
    return _compile(
           ^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 924, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 666, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_utils_internal.py", line 87, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 699, in _compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1322, in transform_code_object
    transformations(instructions, code_options)
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 219, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 634, in transform
    tracer.run()
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2796, in run
    super().run()
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in run
    while self.step():
          ^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 895, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2987, in RETURN_VALUE
    self._return(inst)
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2972, in _return
    self.output.compile_subgraph(
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1117, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1369, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1416, in call_user_compiler
    return self._call_user_compiler(gm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1465, in _call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
CalledProcessError: Command '['/home/mgorny/.conda/envs/pytorch/bin/x86_64-conda-linux-gnu-cc', '/tmp/tmpwwhx19wm/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpwwhx19wm/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/backends/nvidia/lib', '-L/lib/x86_64-linux-gnu', '-L/lib32', '-I/home/mgorny/.conda/envs/pytorch/lib/python3.12/site-packages/triton/backends/nvidia/include', '-I/tmp/tmpwwhx19wm', '-I/home/mgorny/.conda/envs/pytorch/include/python3.12']' returned non-zero exit status 1.

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

amjames · 2024-11-25T17:14:56Z

@rgommers Thanks!

Some manual testing seems to confirm success at the "build and seems to compile stuff with nvcc" level - perhaps you have some suggestions into what subset of the PyTorch test suite to run to confirm that PyTorch + Triton works as designed?

What kind of time-limits are we working with? Full coverage is probably a non-starter, but the inductor tests will be the best place to focus on.

I would start with:

test_torchindcutor.py Which will broadly check codegen
test_max_autotune.py will cover many of the custom kernel templates we use in the compiler backend.
test_flex_attention.py Is probably the most sophisticated way pytorch+triton interact right now.

Some of these tests will require a specific GPU like A100, but in general the tests should be annotated to skip if they have special requirements like that which are not met.

mgorny · 2024-11-25T19:52:40Z

The plot thickens. After fixing the path errors, I'm getting:

RuntimeError: Internal Triton PTX codegen error: 
ptxas /tmp/tmp_at1jgs8.ptx, line 5; fatal   : Unsupported .version 8.6; current version is '8.5'
ptxas fatal   : Ptx assembly aborted due to errors

My guess would be that it doesn't support CUDA 12.6 for some reason. But 12.0 is too old, and I don't think we can cleanly do 12.5 just for Triton without changing PyTorch. Will try to figure out a good solution tomorrow, but I'd appreciate any hints.

Tobias-Fischer · 2024-11-25T21:10:14Z

Are you sure it’s not picking up your system CUDA (12.5?) and then competing against conda CUDA (12.6?)

danpetry · 2024-11-25T23:55:03Z

It might be a problem that the feedstock uses cuda-nvcc in the run section directly, instead of compiler('cuda') with the activation scripts. https://github.com/conda-forge/triton-feedstock/blob/7f07922287846f050f2c65f4cfced35f1e1b311d/recipe/meta.yaml#L66

mgorny · 2024-11-26T08:16:07Z

Actually, it turns out we need one more upstream patch for CUDA 12.6 support. But I also need to fix path search, I'll make a pull request later.

It might be a problem that the feedstock uses cuda-nvcc in the run section directly, instead of compiler('cuda') with the activation scripts. https://github.com/conda-forge/triton-feedstock/blob/7f07922287846f050f2c65f4cfced35f1e1b311d/recipe/meta.yaml#L66

Hmm, is that actually wrong? I didn't think compiler('cuda') actually implies cuda-nvcc too, but I can remove that if it's redundant.

danpetry · 2024-11-26T17:10:19Z

AFAIU compiler('cuda') resolves to cuda-nvcc_, partly because of this setting here

But cuda-nvcc pulls in cuda-nvcc_<platform> anyway, so the activation scripts should be running.

Maybe it's to do with the "conda-build" conditional in the link above, so the I/L flags aren't being added?

mgorny · 2024-11-26T17:27:34Z

Already solved via conda-forge/triton-feedstock#28. Just wondering if I should update the dependencies while at it.

ngam mentioned this issue Apr 5, 2023

Bump to PyTorch 2.0.0 #165

Merged

5 tasks

This was referenced May 5, 2023

Package contains references to third_party folder that we remove conda-forge/triton-feedstock#5

Closed

Fixup third_party references conda-forge/triton-feedstock#6

Merged

h-vetinari mentioned this issue May 6, 2023

Pytorch 2.0 #151

Closed

hmaarrfk added the help wanted Extra attention is needed label Sep 26, 2024

hmaarrfk mentioned this issue Oct 8, 2024

Help-Wanted Priority List #273

Open

building with triton support? #166

building with triton support? #166

Comments

ngam commented Apr 5, 2023 • edited Loading

ngam commented Apr 9, 2023

Tobias-Fischer commented Apr 9, 2023

ngam commented Apr 9, 2023

ngam commented Apr 9, 2023 • edited Loading

Tobias-Fischer commented May 1, 2023

Tobias-Fischer commented May 1, 2023

Tobias-Fischer commented May 6, 2023

h-vetinari commented May 6, 2023

benjaminrwilson commented May 17, 2023

Tobias-Fischer commented May 17, 2023

RaulPPelaez commented Jun 16, 2023

ngam commented Jun 16, 2023

RaulPPelaez commented Jun 16, 2023

ngam commented Jun 16, 2023

ngam commented Jun 16, 2023

cread commented Apr 29, 2024

danpetry commented Oct 25, 2024

mgorny commented Nov 20, 2024

danpetry commented Nov 20, 2024 • edited Loading

danpetry commented Nov 20, 2024

danpetry commented Nov 20, 2024

isuruf commented Nov 20, 2024

danpetry commented Nov 20, 2024 • edited Loading

isuruf commented Nov 20, 2024

mgorny commented Nov 20, 2024

danpetry commented Nov 20, 2024

mgorny commented Nov 20, 2024

isuruf commented Nov 20, 2024

danpetry commented Nov 20, 2024

danpetry commented Nov 20, 2024

isuruf commented Nov 20, 2024

danpetry commented Nov 20, 2024

mgorny commented Nov 20, 2024

mgorny commented Nov 20, 2024

danpetry commented Nov 21, 2024 • edited Loading

danpetry commented Nov 21, 2024

mgorny commented Nov 21, 2024

rgommers commented Nov 25, 2024

danpetry commented Nov 25, 2024

mgorny commented Nov 25, 2024

amjames commented Nov 25, 2024

mgorny commented Nov 25, 2024

Tobias-Fischer commented Nov 25, 2024

danpetry commented Nov 25, 2024

mgorny commented Nov 26, 2024

danpetry commented Nov 26, 2024

mgorny commented Nov 26, 2024

ngam commented Apr 5, 2023 •

edited

Loading

ngam commented Apr 9, 2023 •

edited

Loading

danpetry commented Nov 20, 2024 •

edited

Loading

danpetry commented Nov 20, 2024 •

edited

Loading

danpetry commented Nov 21, 2024 •

edited

Loading