NIM - Llama3-8b-Instruct - GPU resource usage is very high

Hi all,

I had tried the on-premise version of llama3-8b-instruct deployment. The deployment steps were referred from Getting Started — NVIDIA NIM for Large Language Models (LLMs).

First of all, I observed it took a lot of GPU memory (~72GB). It seems not like the docs mentioned **24GB


You can find it here Supported Models — NVIDIA NIM for Large Language Models (LLMs)

Spec

  • GPU: H100 80GB PCIe
  • CPU cores: 128
  • RAM: 1TB

I am wondering is there any idea about this issue?
Thank you so much.