Camera sensor performance

I would like to use Isaac Gym for Deep RL with visual observations. However, the camera sensors seem to be very slow. The interop_torch.py example is about a 100 times slower with camera sensors and rendering. And it doesn’t change if I use 1 pixel camera sensors or change the number of environments. With 16 envs and with 2048 envs on my GPU I get about the same performance of about 3000fps combined on my RTX 3080.
This seems very strange.

And the 3080 draws only about 120 watts during the run. So the implementation for the camera sensors seems to be very inefficient.

The camera sensor’s total performance is always much slower than pure ground truth simulation in headless mode, with any simulation. In Isaac Gym usually, the most optimal number of envs to use with camera sensors is much lower than in the case of pure simulation - around 100-200 depending on the camera resolution vs thousands. The performance numbers you reported with looks reasonable for me.