I have been working to run NIM directly on my system with a single RTX4090. After some issues getting an API token to work, I am able to authenticate and pull the models. However, it detects 0 compatible profiles and defines my GPU as non-free. Has anyone successfully run NIM models natively on their PC with a single 4090? It has been a days-long challenge for me and still not quite there.
{USER REDACTED}:~$ export NGC_API_KEY={REDACTED}
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p “$LOCAL_NIM_CACHE”
docker run -it --rm \
–gpus device=0 \
–shm-size=16GB \
-e NGC_API_KEY \
-v “$LOCAL_NIM_CACHE:/opt/nim/.cache” \
-u $(id -u) \
-p 8000:8000 \
nvcr.io/nim/meta/llama3-8b-instruct:1.0.0
===========================================
== NVIDIA Inference Microservice LLM NIM ==
===========================================
NVIDIA Inference Microservice LLM NIM Version 1.0.0
Model: nim/meta/llama3-8b-instruct
Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This NIM container is governed by the NVIDIA AI Product Agreement here:
NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product.
A copy of this license can be found under /opt/nim/LICENSE.
The use of this model is governed by the AI Foundation Models Community License
here: https://docs.nvidia.com/ai-foundation-models-community-license.pdf.
ADDITIONAL INFORMATION: Meta Llama 3 Community License, Built with Meta Llama 3.
A copy of the Llama 3 license can be found under /opt/nim/MODEL_LICENSE.
2024-06-10 08:00:16,668 [INFO] PyTorch version 2.2.2 available.
2024-06-10 08:00:17,046 [WARNING] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
2024-06-10 08:00:17,046 [INFO] [TRT-LLM] [I] Starting TensorRT-LLM init.
2024-06-10 08:00:17,117 [INFO] [TRT-LLM] [I] TensorRT-LLM inited.
[TensorRT-LLM] TensorRT-LLM version: 0.10.1.dev2024053000
INFO 06-10 08:00:17.664 api_server.py:489] NIM LLM API version 1.0.0
INFO 06-10 08:00:17.665 ngc_profile.py:217] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.
INFO 06-10 08:00:17.665 ngc_profile.py:219] Detected 0 compatible profile(s).
INFO 06-10 08:00:17.665 ngc_profile.py:221] Detected additional 1 compatible profile(s) that are currently not runnable due to low free GPU memory.
ERROR 06-10 08:00:17.665 utils.py:21] Could not find a profile that is currently runnable with the detected hardware. Please check the system information below and make sure you have enough free GPUs.
SYSTEM INFO
- Free GPUs:
- Non-free GPUs:
- [2684:10de] (0) NVIDIA GeForce RTX 4090 [current utilization: 10%]