Improve performance of Parallel Depth image in omni isaac gym

Hello Guys,

I am working with OIG trying to get some depth image observation in parallel . So far my method is extremely slow. I would really appreciate any advice on how to improve the performance using the new replicator API, or change the device in sd_helper to cuda

This is what I do now:

self.viewport = get_active_viewport_window("Viewport").viewport_api
self.viewport.resolution = (self._image_size,self._image_size)


for i in range(self._num_envs):
    gt = self.sd_helper.get_groundtruth(

    image_tensor = torch.tensor(gt["depthLinear"])
    self.stacked_images_tensor[i] = image_tensor