I’m facing an issue while trying to run the NVIDIA Inference Microservice (NIM) for nim/meta/llama-3_1-8b-instruct on an H100 GPU.
I have used the below command
# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct
# The container name from the previous ngc registgry image list command
Repository=nim/meta/llama-3.1-8b-instruct
Latest_Tag=1.1.0
# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:${Latest_Tag}"
# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/downloaded-nim
mkdir -p "$LOCAL_NIM_CACHE"
# Add write permissions to the NIM cache for downloading model assets
chmod -R a+w "$LOCAL_NIM_CACHE"
docker run -it --rm --name=$CONTAINER_NAME \
-e LOG_LEVEL=$LOG_LEVEL \
-e NGC_API_KEY=$NGC_API_KEY \
--gpus all \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
-u $(id -u) \
$IMG_NAME \
I get the following error
===========================================
== NVIDIA Inference Microservice LLM NIM ==
===========================================
NVIDIA Inference Microservice LLM NIM Version 1.0.0
Model: nim/meta/llama-3_1-8b-instruct
Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/#:~:text=This%20license%20agreement%20(%E2%80%9CAgreement%E2%80%9D,algorithms%2C%20parameters%2C%20configuration%20files%2C).
ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement, Built with Llama.
INFO 11-12 06:34:46.602 ngc_profile.py:222] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.
INFO 11-12 06:34:46.602 ngc_profile.py:224] Detected 0 compatible profile(s).
INFO 11-12 06:34:46.602 ngc_profile.py:226] Detected additional 3 compatible profile(s) that are currently not runnable due to low free GPU memory.
ERROR 11-12 06:34:46.602 utils.py:21] Could not find a profile that is currently runnable with the detected hardware. Please check the system information below and make sure you have enough free GPUs.
SYSTEM INFO
- Free GPUs: <None>
- Non-free GPUs:
- [2330:10de] (0) NVIDIA H100 80GB HBM3 (H100 80GB) [current utilization: 37%]
But I have suffient resource available
Tue Nov 12 06:37:25 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:00:05.0 Off | 0 |
| N/A 34C P0 122W / 700W | 30802MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 52900 C python 1626MiB |
| 0 N/A N/A 52985 C python 1868MiB |
| 0 N/A N/A 53120 C python 1626MiB |
| 0 N/A N/A 53254 C python 1626MiB |
| 0 N/A N/A 1496670 C python 6162MiB |
| 0 N/A N/A 1496757 C python 5506MiB |
| 0 N/A N/A 1496843 C python 6168MiB |
| 0 N/A N/A 1496928 C python 6170MiB |
+-----------------------------------------------------------------------------------------+