Hi all,
I had tried the on-premise version of llama3-8b-instruct
deployment. The deployment steps were referred from Getting Started — NVIDIA NIM for Large Language Models (LLMs).
First of all, I observed it took a lot of GPU memory (~72GB). It seems not like the docs mentioned **24GB
You can find it here Supported Models — NVIDIA NIM for Large Language Models (LLMs)
Spec
- GPU: H100 80GB PCIe
- CPU cores: 128
- RAM: 1TB
I am wondering is there any idea about this issue?
Thank you so much.