I am looking to perform inference using the LLaMA2-70B model on an L40s GPU using NVIDIA NGC Containers (NIMs) locally. Is this possible? How should I go about setting this up?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Is it possible to deploy the Llama-70b model with TensorRT LLM on an L40S GPU? | 2 | 517 | May 30, 2024 | |
Is it currently possible to deploy our own models on NVIDIA's cloud and use NIM for inference? | 2 | 161 | July 24, 2024 | |
Which container shoulf be used for inference of trt files in Jetsons? | 2 | 379 | June 27, 2022 | |
RESTful Inference with the TensorRT Container and NVIDIA GPU Cloud | 0 | 248 | August 21, 2022 | |
Container image (nim) construction guide for models where a nim doesn't exist? | 2 | 101 | November 29, 2024 | |
Is there any other way besides TensorRT to increase the GPU utilization while doing the inference? | 1 | 742 | March 25, 2019 | |
Custom TensorRT Container | 2 | 450 | May 10, 2021 | |
Can self-hosted (docker) NIMs function in an environment without Nvidia GPUs? | 2 | 55 | August 14, 2024 | |
TensorRT LLM for NIM | 3 | 191 | January 7, 2025 | |
TensorRT Inference Server on Windows | 1 | 688 | April 9, 2019 |