RTX 4090 shows as "non-free GPU" when running NIM model in docker

I have been working to run NIM directly on my system with a single RTX4090. After some issues getting an API token to work, I am able to authenticate and pull the models. However, it detects 0 compatible profiles and defines my GPU as non-free. Has anyone successfully run NIM models natively on their PC with a single 4090? It has been a days-long challenge for me and still not quite there.


export LOCAL_NIM_CACHE=~/.cache/nim

mkdir -p “$LOCAL_NIM_CACHE”

docker run -it --rm \

–gpus device=0 \

–shm-size=16GB \


-v “$LOCAL_NIM_CACHE:/opt/nim/.cache” \

-u $(id -u) \

-p 8000:8000 \



== NVIDIA Inference Microservice LLM NIM ==


NVIDIA Inference Microservice LLM NIM Version 1.0.0

Model: nim/meta/llama3-8b-instruct

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This NIM container is governed by the NVIDIA AI Product Agreement here:

NVIDIA AI Enterprise Software License Agreement | NVIDIA.

A copy of this license can be found under /opt/nim/LICENSE.

The use of this model is governed by the AI Foundation Models Community License

here: https://docs.nvidia.com/ai-foundation-models-community-license.pdf.

ADDITIONAL INFORMATION: Meta Llama 3 Community License, Built with Meta Llama 3.

A copy of the Llama 3 license can be found under /opt/nim/MODEL_LICENSE.

2024-06-10 08:00:16,668 [INFO] PyTorch version 2.2.2 available.

2024-06-10 08:00:17,046 [WARNING] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error

2024-06-10 08:00:17,046 [INFO] [TRT-LLM] [I] Starting TensorRT-LLM init.

2024-06-10 08:00:17,117 [INFO] [TRT-LLM] [I] TensorRT-LLM inited.

[TensorRT-LLM] TensorRT-LLM version: 0.10.1.dev2024053000

INFO 06-10 08:00:17.664 api_server.py:489] NIM LLM API version 1.0.0

INFO 06-10 08:00:17.665 ngc_profile.py:217] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.

INFO 06-10 08:00:17.665 ngc_profile.py:219] Detected 0 compatible profile(s).

INFO 06-10 08:00:17.665 ngc_profile.py:221] Detected additional 1 compatible profile(s) that are currently not runnable due to low free GPU memory.

ERROR 06-10 08:00:17.665 utils.py:21] Could not find a profile that is currently runnable with the detected hardware. Please check the system information below and make sure you have enough free GPUs.


- Free GPUs:

- Non-free GPUs:

- [2684:10de] (0) NVIDIA GeForce RTX 4090 [current utilization: 10%]