Recommend Compute for running a TensorRT-LLM using LLama2 13B & 70B model

nida.bijapure · November 15, 2023, 10:52am

Recommend us a Compute/Virtual machine specifications to run a TensorRT-LLM using LLama2 13B & 70B model.
We are Referring to the repo: GitHub - NVIDIA/TensorRT-LLM: TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

AakankshaS · November 15, 2023, 1:02pm

Hi @nida.bijapure ,
In the same link the support matrix has been shared. Are you talking about the same?

Thanks

nida.bijapure · November 15, 2023, 2:05pm

Yes. The specification given in the support matrix is a bit confusing. Also, I wanted to know the exact specifications of the infrastructure required to run either Llama 2 13B or Llama 2 70B models on TensorRT-LLM which includes vcpus, RAM, storage, GPU, and any other matrix.
What I know is Llama 2 13B requires about 10-12GB RAM and Llama 2 70B requires 32-40GB RAM but I am unsure that I should have more RAM to run it on TensorRT-LLM.

Topic		Replies	Views
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	1878	January 25, 2024
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server Technical Blog	62	4064	August 28, 2024
How to add custom model to chat with rtx? NVIDIA Nemotron	6	7399	February 23, 2024
Better GPU for training & Inference & Execution LLModels TensorRT cudnn	1	549	November 30, 2023
Supercharging Llama 3.1 across NVIDIA Platforms Technical Blog	14	298	September 17, 2024
NVIDIA TensorRT-LLM 및 NVIDIA Triton Inference Server로 Meta Llama 3 성능 강화 Technical Blog - South Korea	1	320	May 3, 2024
ResourceExhaustedError: Running TF-TRT integration on Jetson AGX Jetson AGX Xavier	10	1221	October 18, 2021
Using ONNX Runtime with TensorRT on Jetson Devices Jetson AGX Xavier tensorrt	5	1178	October 18, 2021
Tune and Deploy LoRA LLMs with NVIDIA TensorRT-LLM Technical Blog	3	572	April 18, 2024
TF-TRT RNN NMT model optimise, Input tensor with shape [?,?] TensorRT	0	649	May 29, 2019

Recommend Compute for running a TensorRT-LLM using LLama2 13B & 70B model

Related topics