Error parsing GPU utilization: Failed to initialize NVML: Unknown Error\n[2025-09-24 07:37:02] âï¸ nvidia-smi failed with exit code: 65280

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) :- GPU
• DeepStream Version :- 7.1

 nvidia-smi
Wed Sep 24 07:56:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000               Off |   00000000:06:00.0 Off |                    0 |
| 48%   74C    P2            197W /  300W |    1697MiB /  46068MiB |     46%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX A6000               Off |   00000000:07:00.0 Off |                    0 |
| 30%   31C    P8             17W /  300W |       4MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         


I am Starting Deepstream pipelines with Deepstream Dockers. but sometimes I am getting this error
``Error parsing GPU utilization: Failed to initialize NVML: Unknown Error\n[2025-09-24 07:37:02] âï¸ nvidia-smi failed with exit code: 65280```

also, Sometime I get Can Not Get Cuda Device count error

the driver version is not correct for DS7.1. please refer to compatibility table.

Allright, I installed the correct Driver. But I am getting this.
Error parsing GPU utilization: Failed to initialize NVML: Unknown Error
[2025-09-26 16:44:48] �~Z| �~O nvidia-smi failed with exit code: 65280

did you run “nvidia-smi” in host? if so, it seems that driver is not installed correctly. Please uninstall the old driver first, then install the new one, then reboot. Here is the guide to use DeepStream docker.

nvidia-smi works correctly
but still this issue comes up
and it is random. sometimes comes up when I spawn docker container or sometimes hours after the docker has been running

What is your docker start command? In docker container, can deepstream-test1 run well? which DeepStream pipeline causes the error “Error parsing GPU utilization: Failed to initialize NVML”?

services:
  ha-edge:
    image: "${ACR_REGISTRY}/ha-edge:${IMAGE_TAG:-latest}"
    container_name: ha-edge-${INSTANCE_ID:-default}
    restart: unless-stopped
    runtime: nvidia
    volumes:
      - type: bind
        source: ${SECRETS_PATH}
        target: /app/ha-secrets.yml
        read_only: true
      - type: bind
        source: ${RUNTIME_PATH}
        target: /app/runtime
        read_only: false
      - type: bind
        source: ${LOGS_PATH}
        target: /app/logs
        read_only: false
      - /var/run/docker.sock:/var/run/docker.sock
      - /usr/bin/docker:/usr/bin/docker
      - /tmp/.X11-unix/:/tmp/.X11-unix
    devices:
      - /dev/nvidia-modeset:/dev/nvidia-modeset
      - /dev/nvidia-caps:/dev/nvidia-caps
    security_opt:
      - seccomp:unconfined
    environment:
      DISPLAY: ${DISPLAY:-:1}
      LOG_LEVEL: ${LOG_LEVEL:-info}
      HA_HEADLESS: ${HA_HEADLESS:-true}
      REDIS_HOST: ha-redis-${INSTANCE_ID:-default}
      REDIS_PORT: 6379
      GST_DEBUG: "1"
      GST_PLUGIN_PATH: "/usr/lib/x86_64-linux-gnu/gstreamer-1.0"
      NVIDIA_VISIBLE_DEVICES: "all"
      NVIDIA_DRIVER_CAPABILITIES: "compute,utility,video,graphics"
    networks:
      - ha-network
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu,compute,utility,video,graphics]
        limits:
          memory: ${MEMORY_LIMIT:-10G}
    labels:
      - "ha.device_id=${HA_DEVICE_ID:-unknown}"
      - "ha.service=edge"
      - "ha.instance_id=${INSTANCE_ID:-default}"
    depends_on:
      redis:
        condition: service_healthy

networks:
  ha-network:
    driver: bridge

Above is my Docker Compose file. Below is my docker file.

ARG VERSION

FROM ha-edge-base:$VERSION AS dgpu-base

WORKDIR /app

# To get video driver libraries at runtime (libnvidia-encode.so/libnvcuvid.so)
ENV NVIDIA_DRIVER_CAPABILITIES=$NVIDIA_DRIVER_CAPABILITIES,video,compute,graphics,utility

ENV CUDA_HOME=/usr/local/cuda
ENV CFLAGS="-I$CUDA_HOME/include $CFLAGS"
ENV RUNNING_IN_DOCKER=true
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib

# Fix for Ultralytics YOLO config directory warning
ENV YOLO_CONFIG_DIR=/tmp

# Set alias for faster builds
RUN echo 'alias build-app="meson setup runtime/build --reconfigure && ninja -C runtime/build install -j12"' >> ~/.bashrc

  1. To narrow down this issue, If starting the DeepStream 7.1 docker with the method in the guide, can “nvidia-smi” and deepstream-test1 run well in docker container?
  2. About “Failed to initialize NVML”, Please refer to this topic.

[1] Yes it does. I can see output of nvidia-smi and can run deepstream-test1
[2] Okay.

The issue is, It works sometimes and it does not work the other times. At the moment I do not know under what conditions it fails.

when It fails, We get error in this default print generated by Deepstream Docker.

===============================
hawk-edge-09      |    DeepStreamSDK 
hawk-edge-09      | ===============================
hawk-edge-09      | 
hawk-edge-09      | *** LICENSE AGREEMENT ***
hawk-edge-09      | By using this software you agree to fully comply with the terms and conditions
hawk-edge-09      | of the License Agreement. The License Agreement is located at
hawk-edge-09      | /opt/nvidia/deepstream/deepstream/LicenseAgreement.pdf. If you do not agree
hawk-edge-09      | to the terms and conditions of the License Agreement do not use the software.
hawk-edge-09      | 
hawk-edge-09      | 
hawk-edge-09      | =============================
hawk-edge-09      | == Triton Inference Server ==
hawk-edge-09      | =============================
hawk-edge-09      | 
hawk-edge-09      | NVIDIA Release 24.08 (build 107631419)
hawk-edge-09      | Triton Server Version 2.49.0
hawk-edge-09      | 
hawk-edge-09      | Copyright (c) 2018-2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
hawk-edge-09      | 
hawk-edge-09      | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
hawk-edge-09      | 
hawk-edge-09      | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
hawk-edge-09      | By pulling and using the container, you accept the terms and conditions of this license:
hawk-edge-09      | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
hawk-edge-09      | 

somewhere Above we get the error.

From the log in your last comment, there was no any error printing. there was only some logs of Triton starting. Do you mean if starting the docker with method in the guide, nvidia-smi and deepstream-test1 work well every time? if so, please simplify the custom docker start method to narrow down this issue. Here is a sample.