Unable to Run NIM on H100 GPU Due to Profile Compatibility Issue Despite Sufficient GPU Resources

I’m facing an issue while trying to run the NVIDIA Inference Microservice (NIM) for nim/meta/llama-3_1-8b-instruct on an H100 GPU.

I have used the below command

# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct

# The container name from the previous ngc registgry image list command
Repository=nim/meta/llama-3.1-8b-instruct
Latest_Tag=1.1.0

# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:${Latest_Tag}"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/downloaded-nim
mkdir -p "$LOCAL_NIM_CACHE"

# Add write permissions to the NIM cache for downloading model assets
chmod -R a+w "$LOCAL_NIM_CACHE"

docker run -it --rm --name=$CONTAINER_NAME \
    -e LOG_LEVEL=$LOG_LEVEL \
    -e NGC_API_KEY=$NGC_API_KEY \
    --gpus all \
    -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
    -u $(id -u) \
    $IMG_NAME \

I get the following error


===========================================
== NVIDIA Inference Microservice LLM NIM ==
===========================================

NVIDIA Inference Microservice LLM NIM Version 1.0.0
Model: nim/meta/llama-3_1-8b-instruct

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/#:~:text=This%20license%20agreement%20(%E2%80%9CAgreement%E2%80%9D,algorithms%2C%20parameters%2C%20configuration%20files%2C).

ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement, Built with Llama.

INFO 11-12 06:34:46.602 ngc_profile.py:222] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.
INFO 11-12 06:34:46.602 ngc_profile.py:224] Detected 0 compatible profile(s).
INFO 11-12 06:34:46.602 ngc_profile.py:226] Detected additional 3 compatible profile(s) that are currently not runnable due to low free GPU memory.
ERROR 11-12 06:34:46.602 utils.py:21] Could not find a profile that is currently runnable with the detected hardware. Please check the system information below and make sure you have enough free GPUs.
SYSTEM INFO
- Free GPUs: <None>
- Non-free GPUs:
  -  [2330:10de] (0) NVIDIA H100 80GB HBM3 (H100 80GB) [current utilization: 37%]

But I have suffient resource available

Tue Nov 12 06:37:25 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:00:05.0 Off |                    0 |
| N/A   34C    P0            122W /  700W |   30802MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     52900      C   python                                       1626MiB |
|    0   N/A  N/A     52985      C   python                                       1868MiB |
|    0   N/A  N/A     53120      C   python                                       1626MiB |
|    0   N/A  N/A     53254      C   python                                       1626MiB |
|    0   N/A  N/A   1496670      C   python                                       6162MiB |
|    0   N/A  N/A   1496757      C   python                                       5506MiB |
|    0   N/A  N/A   1496843      C   python                                       6168MiB |
|    0   N/A  N/A   1496928      C   python                                       6170MiB |
+-----------------------------------------------------------------------------------------+

I want to run the microservice on premise , also i have made a docker-compose for my convience. Please review the docker compose

services:
  # nvclip:
  #   image: "nvcr.io/nim/nvidia/nvclip:1.0.0" # Make sure to use the correct image name and tag
  #   container_name: nvclip
  #   runtime: nvidia
  #   env_file:
  #     - .env
  #   ports:
  #     - "3003:8000" 
  #   volumes:
  #     - "~/.cache/nim:/opt/nim/.cache" 
  #   deploy:
  #     resources:
  #       reservations:
  #         devices:
  #           - driver: nvidia
  #             device_ids: ['0']
  #             capabilities: [gpu]
  #   user: "${UID}" 

  nvidia-llama3:
    image: "nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.0" 
    container_name: nvidia-llama3
    env_file:
      - .env
    ports:
      - "3004:8000" 
    volumes:
      - "~/.cache/nim:/opt/nim/.cache" # Mount the cache directory for models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities: [gpu]
    user: "${UID}"