Segmentation fault when using different GPUs

I met a weird bug when running Issacgym’s joint_monkey examples. For example, when I specify cuda:0 as the GPU to run the example, everything works well:

However, when I use different GPU like cuda:1 (by seting CUDA_VISIBLE_DEVICES=1), the error happens:

@gstate

1 Like

Hi @xidong.feng.20

Looks like we might have a problem to track down here with CUDA_VISIBLE_DEVICES. In the meantime, it is better anyways to explicitly specify your sim and/or graphics devices on the command line using:
--sim_device=cuda:n --graphics_device=k

Note that the sim_device parameter uses CUDA style device syntax, while the graphics_device parameter uses the vulkan device ID, which may not always be the same as the CUDA device ID style.

Take care,
-Gav

2 Likes

this helps!
really hard to find solution btw :)

How does the Vulkan ID relate to the GPU ID assigned by CUDA? I’m having a hard time trying to instantiate new renderers on devices that are not CUDA:0 and would appreciate any pointers to relevant documentation. Thanks!

1 Like

This still doesn’t work for me. Most of the GPU memory is being used on GPU 1 when I specify sim_device=cuda:1 rl_device=cuda:1 graphics_device_id=1, however, some memory is still required on GPU 0. This crashes when GPU 0 is fully utilized, e.g. if tensorflow is running on that GPU. This makes it impossible to run isaac gym on machines shared across multiple users (since someone might be using tensorflow).