Failed to set device_index

I am trying to set up the AODT installation on my system. I am getting the below error inside the aodt_sim container

bmaas@fi-he-hdc-z3-d17-21:~/aodt_1.2.0$ docker logs ddd204652271
Starting container…

==========
== CUDA ==

CUDA Version 12.6.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Failed to set device index: 0
Starting container…

I have set up the aodt_sim compose file like this
connector:
image: nvcr.io/esee5uzbruax/aodt-sim:1.2.0_runtime
network_mode: “host”
working_dir: /aodt/aodt_sim/build
command: ./aodt_sim --nucleus omniverse://0.0.0.0 --broadcast broadcast --log debug
tty: true
env_file:
- .env
environment:
- NVIDIA_DRIVER_CAPABILITIES=all
- OMNI_USER
- OMNI_PASS
restart: unless-stopped
extra_hosts:
- omniverse-server:host-gateway
depends_on:
- clickhouse
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: [“0”]
capabilities: [gpu]

Please assist.

Hi @sujith.samuel
Do you have a working installation of aodt 1.2.0 already on your setup? Can you please share the output of this commands?
nvidia-smi
docker ps

nvidia-smi is
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:19:00.0 Off | 0 |
| N/A 30C P0 73W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:2D:00.0 Off | 0 |
| N/A 31C P0 68W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:3F:00.0 Off | 0 |
| N/A 31C P0 71W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:66:00.0 Off | 0 |
| N/A 29C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 4 NVIDIA H100 80GB HBM3 Off | 00000000:9B:00.0 Off | 0 |
| N/A 29C P0 71W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 5 NVIDIA H100 80GB HBM3 Off | 00000000:AE:00.0 Off | 0 |
| N/A 32C P0 70W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 6 NVIDIA H100 80GB HBM3 Off | 00000000:BF:00.0 Off | 0 |
| N/A 31C P0 70W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 7 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 30C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+

docker ps -a is
bmaas@fi-he-hdc-z3-d17-21:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
58031a04d822 nvcr.io/esee5uzbruax/aodt-sim:1.2.0_runtime “/opt/nvidia/nvidia_…” 8 hours ago Up 20 seconds backend-connector-1
d92a3642bfa6 nvcr.io/esee5uzbruax/aodt-gis:1.2.0 “python3 aodt_py/aod…” 20 hours ago Up 20 hours backend-gis-1
a2025bbc2a2e clickhouse/clickhouse-server:24.7.4.51-alpine “/entrypoint.sh” 20 hours ago Up 20 hours backend-clickhouse-1
e9b9fbb105d6 aodt_sim-notebook “tini -g – start.sh…” 20 hours ago Up 20 hours (healthy) 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp aodt_sim-notebook-1
3d28ab34aeb9 nvcr.io/nvidia/omniverse/nucleus-api:1.14.41 “/root/eula.sh ./lau…” 20 hours ago Up 20 hours base_stack-nucleus-api-1
bba90d434a5d nvcr.io/nvidia/omniverse/nucleus-tagging:3.1.26 “/omni/docker-entryp…” 20 hours ago Up 20 hours base_stack-nucleus-tagging-1
7340d40e73bb nvcr.io/nvidia/omniverse/nucleus-lft-lb:1.14.41 “/root/eula.sh /omni…” 20 hours ago Up 20 hours base_stack-nucleus-lft-lb-1
2ff910af09af nvcr.io/nvidia/omniverse/nucleus-search:3.2.11 “/root/eula.sh /bin/…” 20 hours ago Up 20 hours base_stack-nucleus-search-1
2d1ca9caf74a nvcr.io/nvidia/omniverse/nucleus-lft:1.14.41 “/root/eula.sh pytho…” 20 hours ago Up 20 hours base_stack-nucleus-lft-1
27cdb91f75c0 nvcr.io/nvidia/omniverse/nucleus-navigator:3.3.5 “/root/eula.sh /entr…” 20 hours ago Up 20 hours base_stack-nucleus-navigator-1
5d66284f0425 nvcr.io/nvidia/omniverse/utl-monpx:1.14.41 “/root/eula.sh pytho…” 20 hours ago Up 20 hours base_stack-utl-monpx-1
d1c04b9472f3 nvcr.io/nvidia/omniverse/nucleus-discovery:1.5.4 “/root/eula.sh pytho…” 20 hours ago Up 20 hours base_stack-nucleus-discovery-1
ed216a7962f9 nvcr.io/nvidia/omniverse/nucleus-auth:1.5.5 “/root/eula.sh /bin/…” 20 hours ago Up 20 hours base_stack-nucleus-auth-1
1c8c4c3c4420 nvcr.io/nvidia/omniverse/nucleus-meta-dumper:1.14.41 “/root/eula.sh pytho…” 20 hours ago Up 20 hours 0.0.0.0:5555->5000/tcp, [::]:5555->5000/tcp base_stack-nucleus-meta-dumper-1
3c3c0cbc2163 nvcr.io/nvidia/omniverse/nucleus-resolver-cache:1.14.41 “/root/eula.sh /omni…” 20 hours ago Up 20 hours base_stack-nucleus-resolver-cache-1
265a4ffd258c nvcr.io/nvidia/omniverse/nucleus-thumbnails:1.5.11 “/omni/docker-entryp…” 20 hours ago Up 20 hours base_stack-nucleus-thumbnails-1

As you can see all containers are fine except for the aodt-sim(backend-coonector-1). it keeps restarting with this log inside

Failed to set device index: 0
Starting container…

==========
== CUDA ==

CUDA Version 12.6.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

Please assist.

nvidia-smi is missing the drive rinfo

Thu Feb 6 04:29:19 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:19:00.0 Off | 0 |
| N/A 30C P0 73W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:2D:00.0 Off | 0 |
| N/A 31C P0 68W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:3F:00.0 Off | 0 |
| N/A 31C P0 71W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:66:00.0 Off | 0 |
| N/A 29C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 4 NVIDIA H100 80GB HBM3 Off | 00000000:9B:00.0 Off | 0 |
| N/A 29C P0 71W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 5 NVIDIA H100 80GB HBM3 Off | 00000000:AE:00.0 Off | 0 |
| N/A 32C P0 70W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 6 NVIDIA H100 80GB HBM3 Off | 00000000:BF:00.0 Off | 0 |
| N/A 31C P0 70W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 7 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 30C P0 69W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+

Thanks @sujith.samuel for this info. Unfortunately, we do not support H100 for aodt installation as of yet. Is it possible to use any of the supported GPUs instead?

Installation — Aerial Omniverse Digital Twin

As per this website H100 is supported. Please clarify if H100 can work for frontend and backend.

@sujith.samuel front end is not supported on H100. You are correct, compute capability 9.0 is supported in 1.2 so H100 should support backend. We will try to recreate this on our side.

The docs mention that L40 will work for the frontend. Could you please let me know if L40S would also work

yes, L40S would also work.