4.trt_llm Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available | NVIDIA Technical Blog 参考资料: Welcome to TensorRT-LLM’s documentation!