Originally published at: Release TensorRT-LLM 0.13.0 Release · NVIDIA/TensorRT-LLM · GitHub
Updates include tensor parallel support for Mamba2, sparse mixer normalization for MoE models, and more.
jwitsoe
1
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
NVIDIA TensorRT 10.0 Upgrades Usability, Performance, and AI Model Support | 1 | 178 | May 14, 2024 | |
NVIDIA TensorRT-LLM Revs Up Inference for Google Gemma | 1 | 222 | February 21, 2024 | |
NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200 | 0 | 404 | December 5, 2023 | |
엣지에서 클라우드로 가속화된 Llama 3.2 배포하기 | 1 | 12 | September 30, 2024 | |
추론 성능 가속화하는 새로운 소프트웨어 TensorRT-LLM 출시 | 0 | 631 | September 12, 2023 | |
Deploying Accelerated Llama 3.2 from the Edge to the Cloud | 1 | 28 | September 25, 2024 | |
NVIDIA 플랫폼 전반에서 Llama 3.1 강화하기 | 1 | 12 | August 2, 2024 | |
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server | 62 | 3157 | August 28, 2024 | |
Just Released: TensorRT 8.4 | 0 | 293 | June 16, 2022 | |
NVIDIA TensorRT-LLM 및 NVIDIA Triton Inference Server로 Meta Llama 3 성능 강화 | 1 | 250 | May 3, 2024 |