Invalid device ordinal error when running example code interop_torch.py with CUDA_VISIBLE_DEVICES set

xingyulin2016 · May 30, 2023, 8:02pm

To reproduce the error:

export CUDA_VISIBLE_DEVICES=2 # This is the same GPU device as my graphics device)
python interop_torch.py --sim_device=‘cuda:0’ --graphics_device_id=3 --headless

Std out and std err from running the command:

Importing module 'gym_38' (/home/cirrascale/Projects/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/cirrascale/Projects/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 2.0.1
Device count 1
/home/cirrascale/Projects/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/cirrascale/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Emitting ninja build file /home/cirrascale/.cache/torch_extensions/py38_cu117/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
[Error] [carb.gym.plugin] Gym cuda error: invalid device ordinal: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 926
[Error] [carb.gym.plugin] Failed to fill rigid body state tensor
Loading extension module gymtorch...
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Got camera tensor with shape (128, 128, 4)
  Torch camera tensor device: cuda:3
  Torch camera tensor shape: torch.Size([128, 128, 4])
Gym state tensor shape: (16, 13)
Gym state tensor data @ 0x7f2b41a00000
Torch state tensor device: cuda:0
Torch state tensor shape: torch.Size([16, 13])
Torch state tensor data @ 0x7f2b41a00000
========= Frame 0 ==========
RB positions:
tensor([[0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000],
        [0.0000, 4.9959, 0.0000]], device='cuda:0')
Traceback (most recent call last):
  File "interop_torch.py", line 196, in <module>
    cam_img = cam_tensors[i].cpu().numpy()
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

gus.gurung · January 5, 2024, 8:17am

Hi How did you resolve this invalid device ordinal error?
A similar error occurs when trying to run torchrun command to host llama2 locally

Topic		Replies	Views
Error in printing camera tensors (`CUDA error: an illegal memory access was encountered`) Isaac Gym	5	2226	January 15, 2021
cudaErrorInvalidDevice: invalid device ordinal CUDA Setup and Installation	0	397	April 18, 2024
Gym cuda error: no kernel image is available for execution on the device Isaac Gym cuda , kernel	21	5039	July 15, 2024
Examples/interop_torch.py [Error] [carb.gym.plugin] cudaImportExternalMemory failed on rgbImage buffer with error 999 Isaac Gym	1	764	September 6, 2023
Create camera sensor fail on buffer Isaac Gym camera	4	3108	August 28, 2021
RuntimeError: Arguments for call are not valid Isaac Gym	2	1417	June 20, 2022
How to solve this problem? CUDA Setup and Installation cuda	0	338	February 18, 2025
Gymtorch error with all of the tasks Isaac Gym	0	914	April 26, 2024
cudaImportExternalMemory failed on rgbImage Isaac Gym	7	2642	June 28, 2024
Isaac Gym + 4090 issues Isaac Sim	3	305	July 29, 2025

Invalid device ordinal error when running example code interop_torch.py with CUDA_VISIBLE_DEVICES set

Related topics