Nemollm-inference-microservice failed to deploy

I’m trying to deploy nvcr.io/nim/meta/llama3-8b-instruct:1.0.3 on Openshift cluster with H100 GPU, but getting following error

Defaulted container “nemollm-inference-microservice” out of: nemollm-inference-microservice, init-service (init)

===========================================
== NVIDIA Inference Microservice LLM NIM ==

NVIDIA Inference Microservice LLM NIM Version 1.0.3
Model: meta/llama3-8b-instruct

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This NIM container is governed by the NVIDIA AI Product Agreement here:
NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product.
A copy of this license can be found under /opt/nim/LICENSE.

The use of this model is governed by the AI Foundation Models Community License
here: https://docs.nvidia.com/ai-foundation-models-community-license.pdf.

ADDITIONAL INFORMATION: Meta Llama 3 Community License, Built with Meta Llama 3.
A copy of the Llama 3 license can be found under /opt/nim/MODEL_LICENSE.

2024-10-21 10:55:29,087 [INFO] PyTorch version 2.2.2 available.
hwloc/linux: Ignoring PCI device with non-16bit domain.
Pass --enable-32bits-pci-domain to configure to support such devices
(warning: it would break the library ABI, don’t enable unless really needed).
hwloc/linux: Ignoring PCI device with non-16bit domain.
Pass --enable-32bits-pci-domain to configure to support such devices
(warning: it would break the library ABI, don’t enable unless really needed).
2024-10-21 10:55:29,995 [WARNING] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
2024-10-21 10:55:29,996 [INFO] [TRT-LLM] [I] Starting TensorRT-LLM init.
[TensorRT-LLM] TensorRT-LLM version: 0.10.1.dev2024053000
2024-10-21 10:55:30,087 [INFO] [TRT-LLM] [I] TensorRT-LLM inited.
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.10/dist-packages/vllm_nvext/entrypoints/openai/api_server.py”, line 60, in
from vllm_nvext.engine.async_trtllm_engine import AsyncLLMEngineFactory
File “/usr/local/lib/python3.10/dist-packages/vllm_nvext/engine/async_trtllm_engine.py”, line 46, in
from vllm_nvext.hub.ngc_injector import is_trt_llm_model
File “/usr/local/lib/python3.10/dist-packages/vllm_nvext/hub/ngc_injector.py”, line 25, in
from vllm_nvext.hub.ngc_profile import filter_manifest_configs, get_profile_description
File “/usr/local/lib/python3.10/dist-packages/vllm_nvext/hub/ngc_profile.py”, line 291, in
def profiles_summary(manifest_path: Path, system: HwSystem = get_hardware_spec()) → str:
File “/usr/local/lib/python3.10/dist-packages/vllm_nvext/hub/hardware_inspect.py”, line 217, in get_hardware_spec
device_mem_total, device_mem_free = gpus.device_mem(device_id)
File “/usr/local/lib/python3.10/dist-packages/vllm_nvext/hub/hardware_inspect.py”, line 103, in device_mem
mem_data = pynvml.nvmlDeviceGetMemoryInfo(handle)
File “/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py”, line 2440, in nvmlDeviceGetMemoryInfo
_nvmlCheckReturn(ret)
File “/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py”, line 833, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.nvml.NVMLError_NoPermission: Insufficient Permissions

Is it issue with GPU configuration ?

Hi @mahantesh.meti – could it be related to the issue described here: Troubleshooting — NVIDIA Container Toolkit 1.16.2 documentation ?