I am trying to set up my GPU server to run Isaac Sim 2022.2.1 headless through Python and within a docker container. The GPU server and docker has Ubuntu 20.04 (focal). I need to run this version of Isaac!
Right now I am having problem with getting Vulkan do identify the GPUs in my server (not docker, for now).
nvidia-smi
output:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:01:00.0 Off | 0 |
| N/A 39C P0 59W / 275W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:47:00.0 Off | 0 |
| N/A 39C P0 62W / 275W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:81:00.0 Off | 0 |
| N/A 39C P0 60W / 275W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA DGX Display On | 00000000:C1:00.0 Off | N/A |
| 34% 43C P8 N/A / 50W | 1MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM4-80GB On | 00000000:C2:00.0 Off | 0 |
| N/A 39C P0 58W / 275W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
There are no /etc/vulkan/icd.d/
or /usr/share/vulkan/icd.d
files (not being created after driver installation). I already made a purge many times following the step described here, and I am installing NVIDIA driver with sudo ubuntu-drivers install --gpgpu
, which is installing the no-dkms-535-server
driver for my system.
vulkaninfo
output (after extracting tar
file from source and /bin/vulkaninfo
):
ERROR: [Loader Message] Code 0 : vkCreateInstance: Found no drivers!
Cannot create Vulkan instance.
This problem is often caused by a faulty installation of the Vulkan driver or attempting to use a GPU that does not support Vulkan.
ERROR at /vulkan-sdk/1.3.268.0/source/Vulkan-Tools/vulkaninfo/vulkaninfo.h:688:vkCreateInstance failed with ERROR_INCOMPATIBLE_DRIVER
Also, the GPU server has the /usr/bin/Xwayland
process running, but no DISPLAY
or WAYLAND_DISPLAY
env. vars.
I am operating the server via ssh
.
I am having trouble to set up this GPU server and It seems that it is out of my league to solve this issue. Does anyone could help?
Other outputs:
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.129.03 Thu Oct 19 18:56:32 UTC 2023
GCC version:
$ strings /usr/lib/x86_64-linux-gnu/libGLX
libGLX_indirect.so.0 libGLX_mesa.so.0.0.0 libGLX.so.0
libGLX_mesa.so.0 libGLX.so libGLX.so.0.0.0
No xorg.conf
file too.