Hi, when walking through the tutorial for the AODT setup, we encounter an error when attempting to run the command Start UE Mobility:
…
Problem initializing EM solver: operation not supported.
Recommend checking your scenario settings and GPU memory usage on compute node.
[Warning] [clickhouse_driver.connection] Error on socket shutdown: [Errno 9] Bad file descriptor
[Error] [aodt.database.db_utils]
Failed to execute database query:
SELECT COUNT(DISTINCT database)
FROM system.tables
WHERE database IN (
SELECT database
FROM system.tables
WHERE name = ‘db_info’
)
Error: Simultaneous queries on single connection detected
…
The error log has been attached for reference. The setup we are attempting to implement and test is the “Dell R750 (Colocated)”, but we cannot dedicate a physical host for the implementation at this time. We are exploring a virtualized implementation hosted on ESXi/vCenter 8. Additional environment information
Server Host: Dell PowerEdge R760xa
Server CPU: x2 Intel(R) Xeon(R) Platinum 8462Y+
Server GPU: x4 NVIDIA L40S 48GB VRAM
ESXi Host version: VMware ESXi, 8.0.3, U3d - 24585383
vCenter server version: 8.0.3.00400
ESXi NVIDIA vGPU version: 570.124.03 - Build 0000
Guest VM CPU Count: 48 CPU(s)
Guest VM Memory count: 512GB
Guest VM storage allocation: 2TB
Guest VM OS: Ubuntu 22.04.5 LTS
Guest VM vGPU count: x2 nvidia_l40s-48q
Guest VM GPU Driver+CUDA version: 570.124.06; CUDA 12.8
Please let me know if you have any questions or if additional information is needed, thank you.
ov-aodt12-err.txt (4.7 KB)
Hi @dbiscardi2013
Please provide the output of these commands in your terminal:
nvidia-smi
docker ps
also, you can change the db name and try again.
Please see attached for nvidia-smi/docker ps terminal output (nvidia-smi and docker ps.txt output is when the aodt application is not running). When I tried to change the db name and repeat the workflow, I received a different error message (also attached).
nvidia-smi and docker ps.txt (7.0 KB)
below is the output for nvidia-smi and docker ps - aodt running when the aodt script is executed from the desktop, not sure if it’s expected but the db summary in the content tab lists that the RUs/DUs/UEs are not created in the database
nvidia-smi and docker ps - aodt running.txt (7.3 KB)
Thanks for the info.
The recommended GPU driver and CUDA version for AODT 1.2.0 is:
Driver Version: 560.35.03 CUDA Version: 12.6
I noticed your system has different versions for both.
How did you install AODT on your system? did you modify any of the installation scripts?
I modified the script to get it to install on the system (skipped the driver check by setting the RECOMMENDED_VERSION string in the make_install.sh script to the existing driver version installed on the guest VM). My initial thinking was that the CUDA version included with the driver install is backwards compatible with previous versions, hopefully this is still the case and we’re encountering a different kind of issue. I wasn’t able to find a driver for 560.x that included GRID support, it looks like the closest drivers with GRID support that are available on https://ui.licensing.nvidia.com/software catalog are:
-Complete vGPU 17.6 package for VMware vSphere 8.0 including supported guest drivers (550.163.02-550.163.01-553.74)
-Complete vGPU 18.0 package for VMware vSphere 8.0 including supported guest drivers (570.124.03-570.124.06-572.60)
The “Complete vGPU 17.2 package for VMware vSphere 8.0…” package includes guest driver version 552.55, and according to the aerial-dt documentation the script for an Azure instance deployment will deploy driver 552.55 for the frontend role. If our error is related to driver/CUDA versions, would we be able to use 552.55/55x.xx for the backend as well and install 12.6 using a CUDA forward compatibility package (1. Why CUDA Compatibility — CUDA Compatibility), or does 1.2.0 require driver 560.35.03 for the backend to work (would the dev/internal team know if there are any compatible GRID drivers we could try instead)?
Hi @dbiscardi2013
Unfortunately, any other combination of driver/CUDA versions that is not officially recommended/supported, is not guaranteed to function properly and has not been tested.
Due to potential issues and to save time/resources, it is advisable to install AODT on the supported and qualified systems, as referred to in the documentation.