Isaac Gym + 3090 issues

Hi,
I’ve installed pytorch 1.7 + cuda 11.0. And pytorch works perfectly with my new 3090.
But with isaac gym I got this error:
‘’’
File “rlg_train.py”, line 38, in init
self.obs = self.env.reset()
File “/home/trrrrr/Documents/isaacgym/python/rlgpu/tasks/base/vec_task.py”, line 130, in reset
self.task.step(actions)
File “/home/trrrrr/Documents/isaacgym/python/rlgpu/tasks/base/base_task.py”, line 105, in step
self.post_physics_step()
File “/home/trrrrr/Documents/isaacgym/python/rlgpu/tasks/humanoid.py”, line 288, in post_physics_step
self.reset(env_ids)
File “/home/trrrrr/Documents/isaacgym/python/rlgpu/tasks/humanoid.py”, line 253, in reset
positions = torch_rand_float(-0.2, 0.2, (len(env_ids), self.num_dof), device=self.device)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
‘’’
Could you help me to fix this issue?

The problem you’re hitting is that the default PyTorch version available through Anaconda doesn’t support GeForce Ampere cards out of the box. Here’s a thread on the topic on the PyTorch side: https://github.com/pytorch/pytorch/issues/45021

The ideal solution will be to wait until cuda-toolkit 11.1 is made available on the Anaconda side, and use that in conjunction with an updated PyTorch release, but there are two ways to deal with it in the meantime:

1. Use the docker container installation instructions

The Dockerfile we provide uses the latest PyTorch Image from ngc.nvidia.com: https://ngc.nvidia.com/catalog/containers/nvidia:pytorch

This image has the right configuration to allow headless training in Docker, but doesn’t support anything that requires rendering, as Vulkan is not supported in the container

Alternatively you can try…

2. Use the pre-release PyTorch 1.8 nightly build, along with hacks for CUDA 11.1 support:

This is a hacky solution that works with rendering, but which is far from pretty, and could be challenging to set up. It also might not work depending on that day’s build of PyTorch. Try this at your own risk.

First, download and install the latest CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit - depending on how you do the install, this may require you to install an updated graphics driver.

Set up the rlgpu conda environment as described in the Isaac Gym documentation.

Next, switch over to the rlgpu conda environment, and install the latest pytorch nightly build:

conda activate rlgpu
conda install pytorch torchvision cudatoolkit=11 -c pytorch-nightly

This still won’t work, since the version of the NVRTC runtime shipped in the Anaconda version of the CUDA toolkit is 11.0, not the 11.1 required to support the 3080 and 3090. But since you have installed the CUDA toolkit locally, you can work around this with a manual symlink. First, move the old library out of the way:

cd ~/anaconda3/envs/rlgpu/lib
mkdir oldcuda
mv *nvrtc* oldcuda

With that done, make a symlink from your locally installed CUDA 11.1 NVRTC to the one in your conda environment:

ln -s /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.11.1 libnvrtc.so.11.0

At this point, if everything works correctly, you should be able to run the RL examples.

Take care,
-Gav

Thanks, it helped.
I got near 200000 inference fps in shadow hand!

I followed the steps above,but give me the following error:

`libnvrtc.so.11.0’ not found (required by /home/user/anaconda3/envs/bgmatting/lib/python3.7/site-packages/torch/lib/libcaffe2_nvrtc.so)

do you know what should i do next?

If you did all of the above, I’m not sure why it wouldn’t find the libnvrtc.so.11.0 symlink you created. I would recommend double checking that the symlink is in the right spot in your /home/user/anaconda3/envs/bgmatting/lib directory. Perhaps you added the symlink in the wrong conda environment (rlgpu)?

Alternatively, you can train headless in the docker container, though you’ll have to run on another system to see the results visually.

Another alternative would be to rebuild PyTorch yourself against the CUDA 11.1 SDK.

Take care,
-Gav