Unable to Run NIM on H100 GPU Due to Profile Compatibility Issue Despite Sufficient GPU Resources

abhijith6 · November 12, 2024, 6:40am

I’m facing an issue while trying to run the NVIDIA Inference Microservice (NIM) for nim/meta/llama-3_1-8b-instruct on an H100 GPU.

I have used the below command

# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct

# The container name from the previous ngc registgry image list command
Repository=nim/meta/llama-3.1-8b-instruct
Latest_Tag=1.1.0

# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:${Latest_Tag}"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/downloaded-nim
mkdir -p "$LOCAL_NIM_CACHE"

# Add write permissions to the NIM cache for downloading model assets
chmod -R a+w "$LOCAL_NIM_CACHE"

docker run -it --rm --name=$CONTAINER_NAME \
    -e LOG_LEVEL=$LOG_LEVEL \
    -e NGC_API_KEY=$NGC_API_KEY \
    --gpus all \
    -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
    -u $(id -u) \
    $IMG_NAME \

I get the following error


===========================================
== NVIDIA Inference Microservice LLM NIM ==
===========================================

NVIDIA Inference Microservice LLM NIM Version 1.0.0
Model: nim/meta/llama-3_1-8b-instruct

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/#:~:text=This%20license%20agreement%20(%E2%80%9CAgreement%E2%80%9D,algorithms%2C%20parameters%2C%20configuration%20files%2C).

ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement, Built with Llama.

INFO 11-12 06:34:46.602 ngc_profile.py:222] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.
INFO 11-12 06:34:46.602 ngc_profile.py:224] Detected 0 compatible profile(s).
INFO 11-12 06:34:46.602 ngc_profile.py:226] Detected additional 3 compatible profile(s) that are currently not runnable due to low free GPU memory.
ERROR 11-12 06:34:46.602 utils.py:21] Could not find a profile that is currently runnable with the detected hardware. Please check the system information below and make sure you have enough free GPUs.
SYSTEM INFO
- Free GPUs: <None>
- Non-free GPUs:
  -  [2330:10de] (0) NVIDIA H100 80GB HBM3 (H100 80GB) [current utilization: 37%]

But I have suffient resource available

Tue Nov 12 06:37:25 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:00:05.0 Off |                    0 |
| N/A   34C    P0            122W /  700W |   30802MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     52900      C   python                                       1626MiB |
|    0   N/A  N/A     52985      C   python                                       1868MiB |
|    0   N/A  N/A     53120      C   python                                       1626MiB |
|    0   N/A  N/A     53254      C   python                                       1626MiB |
|    0   N/A  N/A   1496670      C   python                                       6162MiB |
|    0   N/A  N/A   1496757      C   python                                       5506MiB |
|    0   N/A  N/A   1496843      C   python                                       6168MiB |
|    0   N/A  N/A   1496928      C   python                                       6170MiB |
+-----------------------------------------------------------------------------------------+

abhijith6 · November 12, 2024, 6:42am

I want to run the microservice on premise , also i have made a docker-compose for my convience. Please review the docker compose

services:
  # nvclip:
  #   image: "nvcr.io/nim/nvidia/nvclip:1.0.0" # Make sure to use the correct image name and tag
  #   container_name: nvclip
  #   runtime: nvidia
  #   env_file:
  #     - .env
  #   ports:
  #     - "3003:8000" 
  #   volumes:
  #     - "~/.cache/nim:/opt/nim/.cache" 
  #   deploy:
  #     resources:
  #       reservations:
  #         devices:
  #           - driver: nvidia
  #             device_ids: ['0']
  #             capabilities: [gpu]
  #   user: "${UID}" 

  nvidia-llama3:
    image: "nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.0" 
    container_name: nvidia-llama3
    env_file:
      - .env
    ports:
      - "3004:8000" 
    volumes:
      - "~/.cache/nim:/opt/nim/.cache" # Mount the cache directory for models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities: [gpu]
    user: "${UID}"

Topic		Replies	Views
How to fix 0 compatible profiles? Where to get compatible profiles? Models nim , llama-31-8b-instruct , llama	4	445	November 26, 2024
NVIDIA NIM Container with CUDA out of Memory Problem Docker and NVIDIA Docker cuda , ubuntu , docker , nim , llama3-8b-instruct	2	468	September 20, 2024
Getting Started With NVIDIA NIM Tutorial Issues with NGC Registry Access/Accounts ubuntu , nim , llm , llama3-8b-instruct	7	1291	July 24, 2024
Nemollm-inference-microservice failed to deploy Models nim , llama3-8b-instruct , llama	1	142	October 22, 2024
NIM API key not Found Models nim , llama-31-8b-instruct , llama	4	502	September 21, 2024
/opt/nim/start-server.sh: line 61: 32 Killed python3 -m vllm_nvext.entrypoints.openai.api_server Container: CUDA	0	253	July 9, 2024
0 Compatible Profiles for Llama 3.1 70B Models nim , llama-31-70b-instruct	6	422	October 28, 2024
RTX 4090 shows as "non-free GPU" when running NIM model in docker AI Foundation Models and Endpoints nim	8	1898	October 21, 2024
Profiles doesnt match machine even though specs are correct Models nim , llama-31-405b-instruct , llama	0	30	November 29, 2024
NIM nim/meta/llama3-8b-instruct - no API key is detected NGC GPU Cloud	2	716	July 23, 2024

Unable to Run NIM on H100 GPU Due to Profile Compatibility Issue Despite Sufficient GPU Resources

Related topics