Recommend Compute for running a TensorRT-LLM using LLama2 13B & 70B model

Recommend us a Compute/Virtual machine specifications to run a TensorRT-LLM using LLama2 13B & 70B model.
We are Referring to the repo: GitHub - NVIDIA/TensorRT-LLM: TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Hi @nida.bijapure ,
In the same link the support matrix has been shared. Are you talking about the same?


Yes. The specification given in the support matrix is a bit confusing. Also, I wanted to know the exact specifications of the infrastructure required to run either Llama 2 13B or Llama 2 70B models on TensorRT-LLM which includes vcpus, RAM, storage, GPU, and any other matrix.
What I know is Llama 2 13B requires about 10-12GB RAM and Llama 2 70B requires 32-40GB RAM but I am unsure that I should have more RAM to run it on TensorRT-LLM.