TRT LLM for Inference - Import Error libcuda

Hi,

When trying to run the enclosed docker compose for TRT LLM I end up with the following error. Any suggestions why this is the case, since as far as I can tell the docker file is aligned with the one at TRT LLM for Inference | DGX Spark ?

/magnus

docker-compose.txt (1.2 KB)

Fetching 18 files: 100%|██████████| 18/18 [09:41<00:00, 32.33s/it]
/root/.cache/huggingface/hub/models–openai–gpt-oss-20b/snapshots/6cee5e81ee83917806bbde320786a8fb61efebee
/usr/local/lib/python3.12/dist-packages/torch/cuda/init.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
Traceback (most recent call last):
File “/usr/local/bin/trtllm-serve”, line 3, in
from tensorrt_llm.commands.serve import main
File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/init.py”, line 70, in
import tensorrt_llm._torch.models as torch_models
File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/init.py”, line 1, in
from .llm import LLM
File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/llm.py”, line 1, in
from tensorrt_llm.llmapi.llm import _TorchLLM
File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/init.py”, line 1, in
from .._torch.async_llm import AsyncLLM
File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/async_llm.py”, line 3, in
from ..llmapi.llm import LLM
File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py”, line 17, in
from tensorrt_llm._utils import mpi_disabled
File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_utils.py”, line 45, in
from tensorrt_llm.bindings import DataType, GptJsonConfig, LayerType
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

First check if host sees GPU

nvidia-smi

A known-good CUDA image sees GPU inside a container:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

if this fails, fix NVIDIA Container Toolkit / Docker runtime configuration first

Docker Compose v2 supports GPUs via gpus: / device requests. Docker’s docs show the supported patterns.

Here’s the minimal change (keep the rest of your service as-is):

services:
  trtllm_llm_server:
    image: ${DOCKER_IMAGE}
    network_mode: host
    ipc: host
    restart: unless-stopped

    # this is what you’re missing
    gpus: all

    environment:
      HF_TOKEN: ${HF_TOKEN}
      MODEL_HANDLE: ${MODEL_HANDLE}
      TIKTOKEN_ENCODINGS_BASE: ${TIKTOKEN_ENCODINGS_BASE}
      NVIDIA_DRIVER_CAPABILITIES: compute,utility   # optional but common
      NVIDIA_VISIBLE_DEVICES: all                   # optional but explicit

    volumes:
      - ${HOME}/.cache/huggingface:/root/.cache/huggingface

    command: >
      bash -lc '... your existing command ...'

Hi,

It was the missing gpus:all that was the problem.

However, noted that running above with openai/gpt-oss-20b eats up >90GB of memory.
Is this really right? I’m new to trt-llm and have not yet worked out all settings yet, but this sounds far more than expected.