Segmentation fault when using different GPUs

xidong.feng.20 · August 16, 2022, 11:52pm

I met a weird bug when running Issacgym’s joint_monkey examples. For example, when I specify cuda:0 as the GPU to run the example, everything works well:

However, when I use different GPU like cuda:1 (by seting CUDA_VISIBLE_DEVICES=1), the error happens:

@gstate

gstate · August 18, 2022, 5:10am

Hi @xidong.feng.20

Looks like we might have a problem to track down here with CUDA_VISIBLE_DEVICES. In the meantime, it is better anyways to explicitly specify your sim and/or graphics devices on the command line using:
--sim_device=cuda:n --graphics_device=k

Note that the sim_device parameter uses CUDA style device syntax, while the graphics_device parameter uses the vulkan device ID, which may not always be the same as the CUDA device ID style.

Take care,
-Gav

zeyanjie · March 14, 2023, 12:29pm

this helps!
really hard to find solution btw :)

sashwat.mahalingam · May 25, 2023, 10:56am

How does the Vulkan ID relate to the GPU ID assigned by CUDA? I’m having a hard time trying to instantiate new renderers on devices that are not CUDA:0 and would appreciate any pointers to relevant documentation. Thanks!

oleh.rybkin · October 11, 2023, 10:48pm

This still doesn’t work for me. Most of the GPU memory is being used on GPU 1 when I specify sim_device=cuda:1 rl_device=cuda:1 graphics_device_id=1, however, some memory is still required on GPU 0. This crashes when GPU 0 is fully utilized, e.g. if tensorflow is running on that GPU. This makes it impossible to run isaac gym on machines shared across multiple users (since someone might be using tensorflow).

xiaochen.it · May 12, 2024, 8:18am

Agree. I also found that some memory is taken up on GPU 0, though I specify sim_device=cuda:1 rl_device=cuda:1 graphics_device_id=1.

Topic		Replies	Views
Isaacgym graphics_device_id for camera sensor Isaac Gym	0	273	April 21, 2024
Weird bug Isaac Gym cuda , nvbugs	5	3029	March 7, 2024
Why does the gym environment occupy some amount of GPU memory on each visible GPU? Isaac Gym	1	567	February 11, 2021
Can not run example if only gpu0 is visible, but can run example if only gpu1 is visible? Isaac Gym	0	227	April 2, 2024
How can I use different graphics cards for different RL tasks? Isaac Sim cuda , isaacsim	3	534	January 11, 2023
Segmentation Fault with Singularity Container on Multi-GPU System Isaac Gym	0	199	May 29, 2024
Failed to set the CUDA device -1 running Isaac Sim Isaac Sim	2	1440	February 19, 2023
Isaac Gym Segmentation fault (core dumped) Isaac Gym	2	1466	April 2, 2022
Not sure what to set for "graphics_device_id" Isaac Gym	1	1147	November 2, 2021
Segmentation fault at gym.draw_viewer Isaac Gym	3	1210	March 4, 2024

Segmentation fault when using different GPUs

Related topics