RTX 4090 shows as "non-free GPU" when running NIM model in docker

I have been working to run NIM directly on my system with a single RTX4090. After some issues getting an API token to work, I am able to authenticate and pull the models. However, it detects 0 compatible profiles and defines my GPU as non-free. Has anyone successfully run NIM models natively on their PC with a single 4090? It has been a days-long challenge for me and still not quite there.

{USER REDACTED}:~$ export NGC_API_KEY={REDACTED}

export LOCAL_NIM_CACHE=~/.cache/nim

mkdir -p “$LOCAL_NIM_CACHE”

docker run -it --rm \

–gpus device=0 \

–shm-size=16GB \

-e NGC_API_KEY \

-v “$LOCAL_NIM_CACHE:/opt/nim/.cache” \

-u $(id -u) \

-p 8000:8000 \

nvcr.io/nim/meta/llama3-8b-instruct:1.0.0

===========================================

== NVIDIA Inference Microservice LLM NIM ==

===========================================

NVIDIA Inference Microservice LLM NIM Version 1.0.0

Model: nim/meta/llama3-8b-instruct

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This NIM container is governed by the NVIDIA AI Product Agreement here:

NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product.

A copy of this license can be found under /opt/nim/LICENSE.

The use of this model is governed by the AI Foundation Models Community License

here: https://docs.nvidia.com/ai-foundation-models-community-license.pdf.

ADDITIONAL INFORMATION: Meta Llama 3 Community License, Built with Meta Llama 3.

A copy of the Llama 3 license can be found under /opt/nim/MODEL_LICENSE.

2024-06-10 08:00:16,668 [INFO] PyTorch version 2.2.2 available.

2024-06-10 08:00:17,046 [WARNING] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error

2024-06-10 08:00:17,046 [INFO] [TRT-LLM] [I] Starting TensorRT-LLM init.

2024-06-10 08:00:17,117 [INFO] [TRT-LLM] [I] TensorRT-LLM inited.

[TensorRT-LLM] TensorRT-LLM version: 0.10.1.dev2024053000

INFO 06-10 08:00:17.664 api_server.py:489] NIM LLM API version 1.0.0

INFO 06-10 08:00:17.665 ngc_profile.py:217] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.

INFO 06-10 08:00:17.665 ngc_profile.py:219] Detected 0 compatible profile(s).

INFO 06-10 08:00:17.665 ngc_profile.py:221] Detected additional 1 compatible profile(s) that are currently not runnable due to low free GPU memory.

ERROR 06-10 08:00:17.665 utils.py:21] Could not find a profile that is currently runnable with the detected hardware. Please check the system information below and make sure you have enough free GPUs.

SYSTEM INFO

- Free GPUs:

- Non-free GPUs:

- [2684:10de] (0) NVIDIA GeForce RTX 4090 [current utilization: 10%]

2 Likes

Hi Daniel, I have the exact same problem on a VM (vmware) running ubuntu 22.04 with access to a GRID A100D-80C gpu. Did you solve it?

I’m seeing the same issue. I am trying to run the llama3-8b-instruct NVIDIA NIM on Windows 11 with WSL and I’m getting the same error messages about Non-free GPUs. I tried restarting my PC but that did not resolve the issue. Were you able to find a solution for this?

@daniel.brosnan @egidio.desalve I was able to get this to work after going into the EUFI/BIOS and enabling CPU Graphics (integrated graphics)

Hi Egidio, I have the exact same problem, also on a VM (vmware) running ubuntu 22.04 but with a H100-40C. Could you solve the problem in the meantime? I do not find any other resources related to this problem anywhere else. Thank you and best wishes, Simon

Hi Shess, as you can see from this issue I opened:

the problem is that some nvml apis are disabled for mig enabled gpus for safety reasons. Also forcing the NIM profile to use vLLM will provide errors as both tensorRT and vLLM use pynvml to verify gpu resources.

So far I’m using passthrough gpus.

Same issue here with an RTX 3060.

===========================================
== NVIDIA Inference Microservice LLM NIM ==
===========================================

NVIDIA Inference Microservice LLM NIM Version 1.0.0
Model: nim/meta/llama-3_1-8b-instruct

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/#:~:text=This%20license%20agreement%20(%E2%80%9CAgreement%E2%80%9D,algorithms%2C%20parameters%2C%20configuration%20files%2C).

ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement, Built with Llama.

ERROR 07-30 03:59:14.306 utils.py:21] Could not find a profile that is currently runnable with the detected hardware. Please check the system information below and make sure you have enough free GPUs.
SYSTEM INFO
- Free GPUs: <None>
- Non-free GPUs:
  -  [2504:10de] (0) NVIDIA GeForce RTX 3060 [current utilization: 13%]

And the nvidia-smi output

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.86                 Driver Version: 551.86         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060      WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   50C    P0             41W /  170W |    1642MiB /  12288MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

I am able to run some NIMs on a Titan RTX on an Ubuntu system. No VMWare, Linux as the primary machine

My learning experience: Manually validating compatibility and running NVIDIA (NIM) container images