Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trtllm-bench fail #2476

Open
Wowoho opened this issue Nov 21, 2024 · 1 comment
Open

trtllm-bench fail #2476

Wowoho opened this issue Nov 21, 2024 · 1 comment
Labels
triaged Issue has been triaged by maintainers

Comments

@Wowoho
Copy link

Wowoho commented Nov 21, 2024

trtllm-bench --model models/Llama-2-7b-hf throughput --dataset experiments/synthetic_128_128.txt --engine_dir models/Llama2-7b-trt-engine

[TensorRT-LLM] TensorRT-LLM version: 0.15.0.dev2024111200
[11/21/2024-10:26:10] [TRT-LLM] [I] Preparing to run throughput benchmark...
[11/21/2024-10:26:10] [TRT-LLM] [I] Setting up benchmarker and infrastructure.
[11/21/2024-10:26:10] [TRT-LLM] [I] Initializing Throughput Benchmark. [rate=-1 req/s]
[11/21/2024-10:26:10] [TRT-LLM] [I] Ready to start benchmark.
[11/21/2024-10:26:10] [TRT-LLM] [I] Initializing Executor.
[TensorRT-LLM][INFO] Engine version 0.15.0.dev2024111200 found in the config file, assuming engine(s) built by new builder API.

WARNING: A deprecated MPI_Info key was used.

Deprecated key: env
Corrected key: PMIX_ENVAR

We have updated this for you and will proceed. However, this will be treated
as an error in a future release. Please update your application.

[worker:15806] PRTE ERROR: Bad parameter in file base/odls_base_default_fns.c at line 962
[11/21/2024-10:26:11] [TRT-LLM] [I] Benchmark Shutdown called!
[11/21/2024-10:26:11] [TRT-LLM] [I] Executor shutdown.
Traceback (most recent call last):
File "/home/lilin/anaconda3/envs/tensorrt_llm/bin/trtllm-bench", line 8, in
sys.exit(main())
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/click/decorators.py", line 45, in new_func
return f(get_current_context().obj, *args, *kwargs)
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/tensorrt_llm/bench/benchmark/throughput.py", line 182, in throughput_command
benchmark.start_benchmark()
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/tensorrt_llm/bench/benchmark/throughput.py", line 350, in start_benchmark
self.executor = ExecutorManager(self.runtime_config,
File "/home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/tensorrt_llm/bench/benchmark/throughput.py", line 212, in init
self.executor = trtllm.Executor(
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: mComm != MPI_COMM_NULL (/home/jenkins/agent/workspace/LLM/main/L0_Test-x86_64/tensorrt_llm/cpp/tensorrt_llm/common/mpiUtils.cpp:450)
1 0x7fc2a43e3387 tensorrt_llm::common::throwRuntimeError(char const
, int, std::string const&) + 82
2 0x7fc2a43e3dfb /home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x786dfb) [0x7fc2a43e3dfb]
3 0x7fc2a66c794f tensorrt_llm::executor::Executor::Impl::initializeOrchestrator(int, int, tensorrt_llm::executor::ExecutorConfig const&, tensorrt_llm::executor::ParallelConfig, tensorrt_llm::executor::ModelType, std::filesystem::path const&) + 463
4 0x7fc2a66c8d7c tensorrt_llm::executor::Executor::Impl::initializeCommAndWorkers(int, int, tensorrt_llm::executor::ExecutorConfig const&, std::optional<tensorrt_llm::executor::ModelType>, std::optionalstd::filesystem::path const&, std::optional<tensorrt_llm::runtime::WorldConfig> const&, std::optional<tensorrt_llm::runtime::GptJsonConfig> const&) + 1164
5 0x7fc2a66cad22 tensorrt_llm::executor::Executor::Impl::Impl(std::filesystem::path const&, std::optionalstd::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&) + 1698
6 0x7fc2a66b4540 tensorrt_llm::executor::Executor::Executor(std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&) + 64
7 0x7fc3133f63cd /home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x1153cd) [0x7fc3133f63cd]
8 0x7fc31336334d /home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x8234d) [0x7fc31336334d]
9 0x55c5163d9c46 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x13bc46) [0x55c5163d9c46]
10 0x55c5163d2f73 _PyObject_MakeTpCall + 723
11 0x55c5163e57c6 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x1477c6) [0x55c5163e57c6]
12 0x55c5163e61f9 PyVectorcall_Call + 201
13 0x55c5163e3534 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x145534) [0x55c5163e3534]
14 0x55c5163d327b /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x13527b) [0x55c5163d327b]
15 0x7fc313360f0b /home/lilin/anaconda3/envs/tensorrt_llm/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x7ff0b) [0x7fc313360f0b]
16 0x55c5163d2f73 _PyObject_MakeTpCall + 723
17 0x55c5163cf5c3 _PyEval_EvalFrameDefault + 22083
18 0x55c5163d2400 _PyObject_FastCallDictTstate + 208
19 0x55c5163e3009 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x145009) [0x55c5163e3009]
20 0x55c5163d2f8b _PyObject_MakeTpCall + 747
21 0x55c5163cebae _PyEval_EvalFrameDefault + 19502
22 0x55c5163da0cc _PyFunction_Vectorcall + 108
23 0x55c5163ca680 _PyEval_EvalFrameDefault + 1792
24 0x55c5163da0cc _PyFunction_Vectorcall + 108
25 0x55c5163e5e7c PyObject_Call + 188
26 0x55c5163ccce2 _PyEval_EvalFrameDefault + 11618
27 0x55c5163da0cc _PyFunction_Vectorcall + 108
28 0x55c5163e5e7c PyObject_Call + 188
29 0x55c5163ccce2 _PyEval_EvalFrameDefault + 11618
30 0x55c5163e54e2 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x1474e2) [0x55c5163e54e2]
31 0x55c5163e5e7c PyObject_Call + 188
32 0x55c5163ccce2 _PyEval_EvalFrameDefault + 11618
33 0x55c5163da0cc _PyFunction_Vectorcall + 108
34 0x55c5163ca680 _PyEval_EvalFrameDefault + 1792
35 0x55c5163da0cc _PyFunction_Vectorcall + 108
36 0x55c5163ca680 _PyEval_EvalFrameDefault + 1792
37 0x55c5163e5764 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x147764) [0x55c5163e5764]
38 0x55c5163ccce2 _PyEval_EvalFrameDefault + 11618
39 0x55c5163d2400 _PyObject_FastCallDictTstate + 208
40 0x55c5163e3ae9 _PyObject_Call_Prepend + 105
41 0x55c5164a3b89 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x205b89) [0x55c5164a3b89]
42 0x55c5163d2f73 _PyObject_MakeTpCall + 723
43 0x55c5163cebae _PyEval_EvalFrameDefault + 19502
44 0x55c51646a80c /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x1cc80c) [0x55c51646a80c]
45 0x55c51646a757 PyEval_EvalCode + 135
46 0x55c51649ab1a /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x1fcb1a) [0x55c51649ab1a]
47 0x55c516495fa3 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x1f7fa3) [0x55c516495fa3]
48 0x55c5163352c2 /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x972c2) [0x55c5163352c2]
49 0x55c5164907dd _PyRun_SimpleFileObject + 445
50 0x55c516490374 _PyRun_AnyFileObject + 68
51 0x55c51648d6db Py_RunMain + 795
52 0x55c51645de97 Py_BytesMain + 55
53 0x7fc538fed555 __libc_start_main + 245
54 0x55c51645ddae /home/lilin/anaconda3/envs/tensorrt_llm/bin/python3.10(+0x1bfdae) [0x55c51645ddae]

@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Nov 21, 2024
@hello-11 hello-11 assigned kaiyux and unassigned kaiyux Nov 21, 2024
@hello-11
Copy link
Collaborator

@Wowoho We didn't reproduce this bug. Could you provide us more information?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants