Replicator Container 1.5.3-r2 error about Vulkan library

mark148 · December 16, 2022, 9:57pm

I am trying to run the new 1.5.3-r2 Replicator container in an EC2 instance, but encounter an error.
I was successful with version1.4.7-r1, but I get an error with version 1.5.3-r2
I’m using a g5 instance that uses the AMI NVIDIA Omniverse GPU-Optimized AMI

I’m following the instructions from here:

docker pull nvcr.io/nvidia/omniverse-replicator:1.5.3-r2
docker run --gpus all --entrypoint /bin/bash -it nvcr.io/nvidia/omniverse-replicator:1.5.3-r2
./cache_script.sh

The following is the returned output after running cache_script.sh

libGLX_nvidia.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
/opt/nvidia/omniverse/vkapiversion/bin/vkapiversion: error while loading shared libraries: libvulkan.so.1: cannot open shared object file: No such file or directory

result of running nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         Off  | 00000000:00:1E.0 Off |                    0 |
|  0%   21C    P0    56W / 300W |      0MiB / 22731MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

1.4.7-r1 works fine btw.

Thank you,

Mark Olson

mark148 · December 16, 2022, 10:06pm

Solved:

startup.sh has the correct code. I opened cache_script.sh script and replaced the erroring three lines with the following from startup.sh:

LD_LIBRARY_PATH=/opt/nvidia/omniverse/kit-sdk-launcher/plugins/gpu.foundation \
    /opt/nvidia/omniverse/vkapiversion/bin/vkapiversion \
    "${VK_ICD_FILENAMES}"

mark148 · December 16, 2022, 10:33pm

so now there are probably other things wrong as it hangs after the following

libGLX_nvidia.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
Writing disposable ICD file (/tmp/tmp_icd_0Dloh1.json)...
GPU0
        apiVersion     = 1.2.175
        driverVersion  = 470.141.03
        vendorID       = 0x10de
        deviceID       = 0x2237
        deviceName     = NVIDIA A10G


Writing ICD file to (/tmp/nvidia_icd.json)
Running caching process

mark148 · December 16, 2022, 10:47pm

I realize now that I have to upgrade Nvidia drivers… do that now.

mark148 · December 16, 2022, 11:44pm

and /opt/nvidia/omniverse/code-launcher/apps/omni.code.replicator.kit does not exist.

this wants to run in cache_scripts.sh

so no solution.