NIM - Llama3-8b-Instruct - GPU resource usage is very high

thanhdc March 12, 2025, 6:44am 1

Hi all,

I had tried the on-premise version of llama3-8b-instruct deployment. The deployment steps were referred from Getting Started — NVIDIA NIM for Large Language Models (LLMs).

First of all, I observed it took a lot of GPU memory (~72GB). It seems not like the docs mentioned **24GB

Spec

I am wondering is there any idea about this issue?
Thank you so much.

Topic		Replies	Views
NIM - Llama 3 8B Instruct - Results were very weirdn Models nim	1	324	August 27, 2024
NIM does not support llama-3.1-8b-instruct and llama-3.1-70b-instruct on GH200 On-Prem deployment Models nim , llama-31-8b-instruct , llama	1	180	November 7, 2024
Issues while starting NIM container in A10 VM Models nim , llama3-8b-instruct	4	147	September 4, 2024
NIM Llama3 8B Instruct - Running container with "CUDA_ERROR_NO_DEVICE" cuDNN docker , nim , llama3-8b-instruct	1	28	March 28, 2025
Unable to Run NIM on H100 GPU Due to Profile Compatibility Issue Despite Sufficient GPU Resources Models nim , llama-31-8b-instruct , llama	1	183	November 12, 2024
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	106	November 8, 2024
Advancing the Accuracy-Efficiency Frontier with Llama-3.1-Nemotron-51B AI Foundation Models and Endpoints nim , llm , llama	0	68	September 23, 2024
Reusing a stored model (llama-3.1-8b-instruct) with a proper profile Models nim , llama-31-8b-instruct , llama	0	142	October 30, 2024
NIM Llama 3.3 70B requirements Models hw , nim , llama	2	282	March 21, 2025
Can I use models like Llama-3-8B-Instruct-Coder with NIM? Models nim , llama	1	58	September 20, 2024