Just Released: NVIDIA TensorRT-LLM 0.13.0

Originally published at: Release TensorRT-LLM 0.13.0 Release · NVIDIA/TensorRT-LLM · GitHub

Updates include tensor parallel support for Mamba2, sparse mixer normalization for MoE models, and more.