Hello
Try to launch NVIDIA-AI-Blueprints/rag (tag v2.0.0)
Working on VMWare ESXI v8.0 + ESXi_8.0.0_Driver → Ubuntu 24.04 + Driver Version: 570.124.06 and CUDA Version: 12.8
nvidia-smi output:
# nvidia-smi
Mon Apr 21 10:23:38 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 GRID A100D-2-20C On | 00000000:02:00.0 Off | On |
| N/A N/A P0 N/A / N/A | 1MiB / 20480MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 1 GRID A100D-2-20C On | 00000000:02:02.0 Off | On |
| N/A N/A P0 N/A / N/A | 1MiB / 20480MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 2 GRID A100D-2-20C On | 00000000:02:03.0 Off | On |
| N/A N/A P0 N/A / N/A | 1MiB / 20480MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 3 GRID A100D-2-20C On | 00000000:02:04.0 Off | On |
| N/A N/A P0 N/A / N/A | 1MiB / 20480MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 4 GRID A100D-2-20C On | 00000000:02:05.0 Off | On |
| N/A N/A P0 N/A / N/A | 1MiB / 20480MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 5 GRID A100D-2-20C On | 00000000:02:06.0 Off | On |
| N/A N/A P0 N/A / N/A | 1MiB / 20480MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 0 0 0 | 1MiB / 18412MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 4096MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 1 0 0 0 | 1MiB / 18412MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 4096MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 2 0 0 0 | 1MiB / 18412MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 4096MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 3 0 0 0 | 1MiB / 18412MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 4096MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 4 0 0 0 | 1MiB / 18412MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 4096MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 5 0 0 0 | 1MiB / 18412MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 4096MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
# nvidia-container-toolkit --version
NVIDIA Container Runtime Hook version 1.17.5
commit: f785e908a7f72149f8912617058644fd84e38cde
Part of nims.yaml
services:
nim-llm:
container_name: nim-llm-ms
image: nvcr.io/nim/meta/llama-3.1-70b-instruct-pb24h2:1.3.4
volumes:
- ${MODEL_DIRECTORY:-./}:/opt/nim/.cache
user: "${USERID}"
ports:
- "8999:8000"
expose:
- "8000"
security_opt:
- label=disable
environment:
NGC_API_KEY: ${NVIDIA_API_KEY}
CUDA_VERSION: "12.8.0"
NVIDIA_DRIVER_CAPABILITIES: "compute,utility"
NVIDIA_VISIBLE_DEVICES: "all"
NV_CUDA_CUDART_VERSION: "12.8.57-1"
runtime: nvidia
shm_size: 20gb
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: ${INFERENCE_GPU_COUNT:-all}
#device_ids: ['${LLM_MS_GPU_ID:-2,3}']
capabilities: [gpu]
healthcheck:
test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
interval: 10s
timeout: 20s
retries: 100
profiles: ["", "rag"]
Env:
MODEL_DIRECTORY=/data/nvidia/.cache/model-cache
NVIDIA_API_KEY=nvapi-lxTkb......OILb
Nvidia container toolkit
# cat /etc/nvidia-container-runtime/config.toml
#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = true
supported-driver-capabilities = "all"
#swarm-resource = "DOCKER_RESOURCE_GPU"
[nvidia-container-cli]
debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig.real"
load-kmods = true
no-cgroups = true
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:root"
[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"
log-level = "debug"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "nvidia-ctk"
Run and output:
# USERID=$(id -u) docker compose -f deploy/compose/nims.yaml up -d nim-llm
# docker logs -f nim-llm-ms
===========================================
== NVIDIA Inference Microservice LLM NIM ==
===========================================
NVIDIA Inference Microservice LLM NIM Version 1.3.3
Model: meta/llama-3.1-70b-instruct
Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
The NIM container is governed by the NVIDIA Software License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement) and the Product Specific Terms for AI Products (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products).
A copy of this license can be found under /opt/nim/LICENSE.
The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement).
ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement, Built with Llama.
You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead. See https://pypi.org/project/pynvml for more information.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/nim/llm/nim_llm_sdk/entrypoints/launch.py", line 99, in <module>
main()
File "/opt/nim/llm/nim_llm_sdk/entrypoints/launch.py", line 42, in main
inference_env = prepare_environment()
File "/opt/nim/llm/nim_llm_sdk/entrypoints/args.py", line 204, in prepare_environment
engine_args, extracted_name = inject_ngc_hub(engine_args)
File "/opt/nim/llm/nim_llm_sdk/hub/ngc_injector.py", line 239, in inject_ngc_hub
system = get_hardware_spec()
File "/opt/nim/llm/nim_llm_sdk/hub/hardware_inspect.py", line 358, in get_hardware_spec
device_mem_total, device_mem_free, device_mem_used, device_mem_reserved = gpus.device_mem(device_id)
File "/opt/nim/llm/nim_llm_sdk/hub/hardware_inspect.py", line 198, in device_mem
mem_data = pynvml.nvmlDeviceGetMemoryInfo(handle, version=pynvml.nvmlMemory_v2)
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/pynvml/nvml.py", line 2440, in nvmlDeviceGetMemoryInfo
_nvmlCheckReturn(ret)
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.nvml.NVMLError_NoPermission: Insufficient Permissions
With image image: nvcr.io/nim/meta/llama-3.1-70b-instruct:1.8
- container starting, bat OOM kill it after some time