NVIDIA Developer Forums

How can I use NIMs in a self-hosted environment to perform inference with the LLaMA2-70B model on an L40s GPU?

AI & Data Science Deep Learning (Training & Inference) TensorRT

laura.fernandez June 20, 2024, 7:56am 1

I am looking to perform inference using the LLaMA2-70B model on an L40s GPU using NVIDIA NGC Containers (NIMs) locally. Is this possible? How should I go about setting this up?

Topic		Replies	Views	Activity
Is it possible to deploy the Llama-70b model with TensorRT LLM on an L40S GPU? TensorRT tensorrt , ubuntu , inference-server-triton	2	534	May 30, 2024
Is it currently possible to deploy our own models on NVIDIA's cloud and use NIM for inference? Models nim	2	170	July 24, 2024
Which container shoulf be used for inference of trt files in Jetsons? Jetson Xavier NX containers	2	379	June 27, 2022
RESTful Inference with the TensorRT Container and NVIDIA GPU Cloud Technical Blog	0	248	August 21, 2022
Container image (nim) construction guide for models where a nim doesn't exist? Models nim	2	120	November 29, 2024
Is there any other way besides TensorRT to increase the GPU utilization while doing the inference? TensorRT	1	743	March 25, 2019
Custom TensorRT Container TensorRT tensorrt	2	450	May 10, 2021
Can self-hosted (docker) NIMs function in an environment without Nvidia GPUs? Models hw , nim	2	59	August 14, 2024
TensorRT LLM for NIM Models nim	3	208	January 7, 2025
TensorRT Inference Server on Windows TensorRT	1	689	April 9, 2019