I am playing around with the interop_torch.py example. The default script runs fine. But if I change the graphics device to a different GPU (instead of the default 0):
And the graphics_device in create_sim does not seem to respect the environment variable CUDA_VISIBLE_DEVICES. If I set CUDA_VISIBLE_DEVICES=1, then the camera tensors will still be on device cuda:1, but pytorch usually gives cuda:0 in this case.
We’ll have to look into this more closely. There are definitely a few places where we yet handling multi-GPU cases as well as we should be, and this may be one of them. I can’t reproduce this issue on a machine with a single GPU.
What are the two GPUs you have, btw? Are they both the same, or do you have two different architectures?
One further note about CUDA_VISIBLE_DEVICES - that controls available compute devices, but not graphics devices. The create_sim graphics device parameter uses enumerated Vulkan devices, which are not hidden by CUDA_VISIBLE_DEVICES.
If your compute device is on GPU 0 and you’re rendering on GPU 1 that could be a reason for the runtime error - the camera data isn’t on the GPU that PyTorch is expecting.
If you set CUDA_VISIBLE_DEVICES=1 and use GPU 1 for the graphics, do you still see the crash?
And it still gives the same error. It seems like I can only set graphics_device to be either 0 or -1. Other values will give cuda error.
And if I run the script with CUDA_VISIBLE_DEVICES=1 and use GPU 1 for graphics, I will get RuntimeError: CUDA error: invalid device ordinal in the line print(cam_tensors[0].cpu()). My guess is that PyTorch is expecting all the tensors to be on cuda:0 in this case as it does not see other GPUs. But Isaac Gym can still see other gpus and return the camera images on device cuda:1, which PyTorch does not recognize.
Something even stranger than that is happening. It works properly with the graphics device set to 0 and the compute device set to 1. Printing the camera tensor shows it’s on the cuda:0 device.
If you set the graphics device to 1, it works if you do a .clone().detach() on the camera tensor. You can happily move it to any device you want from there as well.
For now I’d suggest just keeping rendering on device 0. This one will likely need more time for us to track down.
Note that there are some places in the RL examples where cuda:0 is explicitly used. You may want to look at that more closely if you want to force training on another GPU.