Hi! I’m actually find some problem running Isaac Gym. I got a nvidia 2070, windows 11 (so there is no problem running graphics application), but when I start an example In python i got:
*** Warning: failed to preload CUDA lib
*** Warning: failed to preload PhysX libs
Importing module ‘gym_38’ (/home/enne/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/enne/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
WARNING: Forcing CPU pipeline.
Not connected to PVD
/buildAgent/work/f3416cf82e3cf1ba/source/physx/src/gpu/PxPhysXGpuModuleLoader.cpp (147) : internal error : libcuda.so!
[Warning] [carb.gym.plugin] Failed to create a PhysX CUDA Context Manager. Falling back to CPU.
Physics Engine: PhysX
Physics Device: cpu
GPU Pipeline: disabled
No GPU devices found.
[Error] [carb.gym.plugin] Failed to create Nvf device in createNvfGraphics. Please make sure Vulkan is correctly installed.
*** Failed to create sim
If i run nvidia-smi in got correctly my graphic card, with a Driver Version: 510.10 and CUDA Version: 11.6.
Is there any incompatibility or some steps are missing?
python3 joint_monkey.py
Importing module ‘gym_37’ (/home/kaykay/Downloads/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_37.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/kaykay/Downloads/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
WARNING: Forcing CPU pipeline.
[Error] [carb.gym.plugin] Sim CUDA device 0 can’t be set, the total number of available devices is -1
Not connected to PVD
/buildAgent/work/f3416cf82e3cf1ba/source/cudamanager/src/CudaContextManager.cpp (404) : warning : cuInit failed
[Warning] [carb.gym.plugin] Failed to create a valid PhysX CUDA Context Manager. Falling back to CPU.
Physics Engine: PhysX
Physics Device: cpu
GPU Pipeline: disabled
[Error] [carb.gym.plugin] Gym cuda error: no CUDA-capable device is detected: …/…/…/source/plugins/carb/gym/impl/Gym/GymCuda.h: 110
[Error] [carb.gym.plugin] Failed to create primary CUDA context
[Warning] [carb.gym.plugin] Failed to create primary CUDA context on graphics device
No GPU devices found.
[Error] [carb.gym.plugin] Failed to create Nvf device in createNvfGraphics. Please make sure Vulkan is correctly installed.
*** Failed to create sim
nvidia-smi gives this:
nvidia-smi
Sun Oct 24 17:54:21 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00 Driver Version: 510.06 CUDA Version: 11.6 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … On | 00000000:01:00.0 Off | N/A |
| N/A 54C P8 4W / N/A | 220MiB / 6144MiB | N/A Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
I was hoping that that would not be a problem with wsl2 (since it is basically a virtual machine, with GPU passthrough and GUI - in my case of Ubuntu 20.04). As far I understood there is some Vulkan problem in wsl2 and maybe this is the motivation of this problem
We have not tested Isaac Gym in a virtual machine or wsl2 so I can’t say for sure what issues may arise. Vulkan is generally required for rendering, if your use case doesn’t require rendering, you could try running one of the examples in headless mode and see if that works for you. From the error messages posted in this thread, it mostly looks like the PhysX backend was not able to find the correct CUDA binaries, or it couldn’t find any available GPU devices, so it’s possible that these things are not being mapped correctly in the virtual environments.
I am getting the same error. I’m using Ubuntu 18.04 on an ec2 g4dn.2xlarge instance.
$ python3.8 joint_monkey.py
Importing module 'gym_38' (/home/ubuntu/.local/lib/python3.8/site-packages/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/ubuntu/.local/lib/python3.8/site-packages/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
WARNING: Forcing CPU pipeline.
[Error] [carb.gym.plugin] Sim CUDA device 0 can't be set, the total number of available devices is -1
Not connected to PVD
/buildAgent/work/45f70df4210b2e3e/source/cudamanager/src/CudaContextManager.cpp (404) : warning : cuInit failed
[Warning] [carb.gym.plugin] Failed to create a valid PhysX CUDA Context Manager. Falling back to CPU.
Physics Engine: PhysX
Physics Device: cpu
GPU Pipeline: disabled
[Error] [carb.gym.plugin] Gym cuda error: no CUDA-capable device is detected: ../../../source/plugins/carb/gym/impl/Gym/GymCuda.h: 110
[Error] [carb.gym.plugin] Failed to create primary CUDA context
[Warning] [carb.gym.plugin] Failed to create primary CUDA context on graphics device
$ nvidia-smi
Wed Apr 6 15:58:12 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 49C P0 28W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
torch has access to cuda, so I’m not sure what’s going wrong
$ python3.8
Python 3.8.12 (default, Oct 12 2021, 13:49:34)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
the directory “/usr/lib/x86_64-linux-gnu” was missing “libcuda.so”. I uninstalled the original NVIDIA driver and then reinstalled it. The problem was solved.
Note that this will enable GPU for PhysX, but will not enable GPU pipeline for joint monkey. See additional step below.
2) “WARNING: Forcing CPU pipeline.”
If you look in the join_monkey.py, line 73, you can see that GPU pipeline is forced to False. I forced it to True and this enabled the GPU pipeline. (You can see that the object is created with pipeline)
1) “WARNING: Forcing CPU pipeline.”
You can see that there is still one “WARNING: Forcing CPU pipeline.” right at the beginning before the sim object is created. As far as I could tell, this is not coming from the Python of the example or of the isaacgym python module. I guess is is coming from one of the compiled libraries.
2) Errors galore
The simulation shows graphically, as it did without the above changes, ad it runs for about the same time before the segfault. However, there are a lot more errors reported in the console.
Regarding the source of the segmentation fault, I ran gym via gdb with and without GPU pipeline enabled. In both cases, the source of the segmentation fault seems to be Vulkan - specifically the lavapipe software render library. Therefore I am guessing that it has something to do with the GUI.
From what I could see online, GPU acceleration for Vulkan might only have been made available recently and seems to need a re-install/compile. I am planning to try that next.
Unfortunately, this descended into dependency hell and I haven’t made the time to follow up.
Since Hyper-V can implement GPU partitioning I was going to try that route. Note that it seems that this is not “true” SR-IOV in the case of graphics and again relies on direct X. All the same, Microsoft have provided the necessary library to pick up the exposed device in the linux guest.
See here: brokeDude2901/dxgkrnl_ubuntu: Microsoft GPU-P (dxgkrnl) on Hyper-V Ubuntu VM (github.com)
I had the opportunity to speak with an NVIDIA employee recently and their advice was that getting it running under windows is still challenging and to go the dual boot route. I did that and got it running under 22.04, but noticed the following:
Phys-X pipeline works without any particular bother (as long at the necessary libraries are on the path).
Selecting GPU pipeline results in segfault even under pure linux!