NVIDIA TensorRT-LLM, 인플라이트 배치로 인코더-디코더 모델 가속화

smoon · December 13, 2024, 6:46am

Originally published at: NVIDIA TensorRT-LLM, 인플라이트 배치로 인코더-디코더 모델 가속화 - NVIDIA Technical Blog

NVIDIA는 최근 NVIDIA TensorRT-LLM이 인코더-디코더 모델 아키텍처를 가속화한다고 발표했습니다. TensorRT-LLM은 다음과 같은 다양한 모델 아키텍처에 대한 추론을 최적화하는 오픈 소스 라이브러리입니다. Llama 3.1과 같은 디코더 전용 모델 Mixtral과 같은 혼합 전문가 모델(MoE) Mamba와 같은 선택적 상태 공간 모델(SSM) 시각-언어 및 영상-언어 응용 분야를 위한 다중 모드 모델 인코더-디코더 모델 지원의 추가는 TensorRT-LLM의 기능을 더욱 확장시켜,…

Topic		Replies	Views
NVIDIA TensorRT-LLM Now Accelerates Encoder-Decoder Models with In-Flight Batching Technical Blog llama	1	24	December 11, 2024
엣지에서 클라우드로 가속화된 Llama 3.2 배포하기 Technical Blog - South Korea llama	1	29	September 30, 2024
NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference Technical Blog	1	16	December 18, 2024
Just Released: NVIDIA TensorRT-LLM 0.13.0 Technical Blog	1	39	October 7, 2024
NVIDIA H100 GPU에서 대규모 언어 모델 추론을 강화하는 NVIDIA TensorRT-LLM Technical Blog - South Korea korean	0	615	September 22, 2023
Deploying Accelerated Llama 3.2 from the Edge to the Cloud Technical Blog llama	1	71	September 25, 2024
NVIDIA 플랫폼 전반에서 Llama 3.1 강화하기 Technical Blog - South Korea llama	1	22	August 2, 2024
NVIDIA TensorRT-LLM Multiblock Attention Boosts Throughput by More Than 3x for Long Sequence Lengths on NVIDIA HGX H200 Technical Blog llama	2	39	November 27, 2024
NVIDIA TensorRT Model Optimizer v0.15 Boosts Inference Performance and Expands Model Support Technical Blog	1	21	August 15, 2024
TensorRT LLM for NIM Models nim	3	234	January 7, 2025

NVIDIA TensorRT-LLM, 인플라이트 배치로 인코더-디코더 모델 가속화

Related topics