We utilize the GridCloner to create multiple environments and each environment contains a camera. We want to use the camera depth image as a part of our observation space. Our current implementation works, but it is slow due to the way we work with cameras.
We tried the following:
Create a viewport for each camera. This fills VRAM very quickly. With like 4-8 environments we fill 12GB of VRAM. Going for a more powerful GPU doesn’t make sense as we want like 1000 parallel environments.
Change the active camera of a single viewport. With this approach, we use very little VRAM, but we have to change the active camera to environment N, make at least 4 simulation steps, take an image, move to environment N + 1, make at least 4 simulation steps, take an image… With 1000 environments we need at least 4000 steps (~66s at 1/60s per step) just for images.
Do you have an efficient way to use multiple cameras in an Omniverse Isaac Gym environment?
In addition, we tried speeding up the simulation steps with the approach suggested below:
However, camera images are empty and the approach, in general, seems unstable (I have to do more testing on this).
As far as I know, there is not a tensor API available to get images from cameras in parallel for RL in the current version of Omniverse Isaac Gym… However, the tensor API for using parallel cameras for training RL agents is coming in new versions according to the following on-demand video (from the last GTC) :