Vulkan issues with A100 GPU, vkCreateInstance failed with ERROR_INCOMPATIBLE_DRIVER

I am trying to set up my GPU server to run Isaac Sim 2022.2.1 headless through Python and within a docker container. The GPU server and docker has Ubuntu 20.04 (focal). I need to run this version of Isaac!

Right now I am having problem with getting Vulkan do identify the GPUs in my server (not docker, for now).

nvidia-smi output:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:01:00.0 Off |                    0 |
| N/A   39C    P0              59W / 275W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  | 00000000:47:00.0 Off |                    0 |
| N/A   39C    P0              62W / 275W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          On  | 00000000:81:00.0 Off |                    0 |
| N/A   39C    P0              60W / 275W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA DGX Display             On  | 00000000:C1:00.0 Off |                  N/A |
| 34%   43C    P8              N/A /  50W |      1MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          On  | 00000000:C2:00.0 Off |                    0 |
| N/A   39C    P0              58W / 275W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

There are no /etc/vulkan/icd.d/ or /usr/share/vulkan/icd.d files (not being created after driver installation). I already made a purge many times following the step described here, and I am installing NVIDIA driver with sudo ubuntu-drivers install --gpgpu, which is installing the no-dkms-535-server driver for my system.

vulkaninfo output (after extracting tar file from source and /bin/vulkaninfo):

ERROR: [Loader Message] Code 0 : vkCreateInstance: Found no drivers!
Cannot create Vulkan instance.
This problem is often caused by a faulty installation of the Vulkan driver or attempting to use a GPU that does not support Vulkan.
ERROR at /vulkan-sdk/1.3.268.0/source/Vulkan-Tools/vulkaninfo/vulkaninfo.h:688:vkCreateInstance failed with ERROR_INCOMPATIBLE_DRIVER

Also, the GPU server has the /usr/bin/Xwayland process running, but no DISPLAY or WAYLAND_DISPLAY env. vars.
I am operating the server via ssh.

I am having trouble to set up this GPU server and It seems that it is out of my league to solve this issue. Does anyone could help?

Other outputs:

$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.129.03  Thu Oct 19 18:56:32 UTC 2023
GCC version:  

$ strings /usr/lib/x86_64-linux-gnu/libGLX
libGLX_indirect.so.0  libGLX_mesa.so.0.0.0  libGLX.so.0           
libGLX_mesa.so.0      libGLX.so             libGLX.so.0.0.0

No xorg.conf file too.

Hi. Please note that the A100 is not fully supported for Isaac Sim mainly because it does not have the NVENC needed for livestreaming.
We also recommend the 525 drivers instead of 535 drivers.
This guide should be helpful for installing drivers and troubleshooting it.

Since livestreaming is not supported. It is recommended to run Isaac Sim headless only or have a desktop VM environment to run as a windowed app. For GUI, you need to have a display setup. Use xrandr to verify.

I installed the drivers using the .run file (after purging, as described on the previous link) and the /etc/vulkan folder was generated. I then /bin/vulkaninfo --sumary to check if the GPUs were recognized, and it was successful. To enable the Docker container to run Isaac Sim with these GPUs, I also had to map the Vulkan folder as a volume into the container with the docker run option -v /etc/vulkan:/etc/vulkan. I still receive some warnings and GLInteropContext::init: carb::windowing is not available error, at startup, but the runheadless.native runs.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.