Omniverse ISAAC Gym - Access camera stream

So I noticed that ISAAC SIM now supports ISAAC Gym integration to set up environments. Is my understanding correct that https://github.com/NVIDIA-Omniverse/IsaacGymEnvs is now deprecated and https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs is now the way to go to set up RL enviorments for ISAAC Gym?

Anyhow, if this is the case, is it possible to access the camera stream anywhere in omni.isaac.core? (ideally directly as a tensor) It seems that the tutorials published only cover examples that deal with ArticulationView.

An answer would be highly appreciated.

1 Like

I am trying to find solution to that, too.

I’ve been able to create viewports in python and sat active camera for each one, but sadly could not load image data from the ISAAC SIM using Synthetic Data Helper.

Also, I stumbled upon one problem, where Camera is already created in USD, and I access it trough prim_path in python, but after simulation starts, robot moves to default position I’ve defined and camera does not follow that movement, but when I try to move robot joints in GUI by hand, everything works.

I hope some of the Devs could help us with some examples of camera usage in python.

Hello,
I also use the Synthetic Data Helper. I have no problem to get the camera image. Here is an example code of how I do it:

import math
import omni
from omni.isaac.kit import SimulationApp

headless = False
simulation_app = SimulationApp({"headless": headless}) # we can also run as headless.

from omni.isaac.core.utils.prims import create_prim, define_prim
from omni.isaac.core import World
from pxr import Gf
from omni.isaac.synthetic_utils import SyntheticDataHelper
sd_helper = SyntheticDataHelper()


class Camera:
    
    def __init__(self, id: str, width: int, height: int, fov, near, far, headless: bool = False, path: str = "/World"):
        """

        Args:
            id: The id of the camera
            width: The horizontal image resolution in pixels
            height: The vertical image resolution in pixels
            fov: The field of view of the camera
            near: The near plane distance
            far: The far plane distance
        """
        
        self.id = id
        self._width = width
        self._height = height
        self.__fov = fov
        self.__near = near
        self.__far = far
        self.__aspect = self._width / self._height # TODO: see why it is not 62 degrees the horizontalAperture !!!!
        self._view_matrix = None

        self.camera_prim_path = f"{path}/{id}"
        fov_horizontal =  self.__aspect * fov 
        focal_length = 1.88 
        attributes = {"horizontalAperture": 2*focal_length*math.tan(fov_horizontal*math.pi/180/2), 
                      "verticalAperture": 2*focal_length*math.tan(fov*math.pi/180/2),
                      "focalLength": focal_length,
                      "clippingRange": (self.__near, self.__far)
                     }
        create_prim(prim_path=self.camera_prim_path, prim_type="Camera",attributes=attributes) #, "clippingRange": (self.__near, self.__far)}) #, "clippingRange": (self.__near, self.__far)}) #, attributes={"width": self._width})
        self.stage = omni.usd.get_context().get_stage()  
        self.camera_prim = self.stage.GetPrimAtPath(self.camera_prim_path)

        # Set as current camera
        if headless:
            viewport_interface = omni.kit.viewport_legacy.get_viewport_interface()
            self.viewport = viewport_interface.get_viewport_window()
        else:
            viewport_handle = omni.kit.viewport_legacy.get_viewport_interface().create_instance()
            list_viewports = omni.kit.viewport_legacy.get_viewport_interface().get_instance_list()
            new_viewport_name = omni.kit.viewport_legacy.get_viewport_interface().get_viewport_window_name(
                viewport_handle
            )
            self.viewport = omni.kit.viewport_legacy.get_viewport_interface().get_viewport_window(viewport_handle) 
            window_width = 200
            window_height = 200
            self.viewport.set_window_size(window_width, window_height)
            self.viewport.set_window_pos(800 , window_height*(len(list_viewports)-2))

        self.viewport.set_active_camera(self.camera_prim_path)
        self.viewport.set_texture_resolution(self._width, self._height)

    def get_image(self):
        # Get ground truths
        gt = sd_helper.get_groundtruth(
            [
                "rgb",
                #"depthLinear",
                "depth",
                #"boundingBox2DTight",
                #"boundingBox2DLoose",
                "instanceSegmentation",
                #"semanticSegmentation",
                #"boundingBox3D",
                #"camera",
                #"pose"
            ],
            self.viewport,
        )

        print("Camera params", sd_helper.get_camera_params(self.viewport))

        segmentation_mask = gt["instanceSegmentation"]
        rgb = gt["rgb"]
        depth = gt["depth"]
        return rgb, depth, segmentation_mask
    
    def get_pose(self):
        transform_matrix = sd_helper.get_camera_params(self.viewport)["pose"]
        return transform_matrix



    def set_prim_pose(self, position, orientation):
        properties = self.camera_prim.GetPropertyNames()
        if "xformOp:translate" in properties:
            translate_attr = self.camera_prim.GetAttribute("xformOp:translate")
            translate_attr.Set(Gf.Vec3d(position))
        if "xformOp:orient" in properties:
            orientation_attr = self.camera_prim.GetAttribute("xformOp:orient")
            orientation_attr.Set(Gf.Quatd(orientation[0], orientation[1], orientation[2], orientation[3]))

if __name__ == "__main__":
    world = World()
    world.scene.add_default_ground_plane()
    path_to = "/World/Scene"
    define_prim(path_to, "Xform")
    stage = omni.usd.get_context().get_stage()
    world.reset()
    camera = Camera(id="my_camera", width=224,height=171, fov=45, near=0.10, far=4, path="/World/Scene")
    camera.set_prim_pose(position=[0,0, 1], orientation=[0, 0, 0, 1])
    print(camera.get_image())

I hope it can help

Thank you for your help. I appreciate it and would also try to integrate it to my application.

One other question, if I may ask? It’s about having multiple cameras in the scene.

Since each view port consumes GPU memory I would like to limit everything to single view port instance and just switch active camera during each get_groundtruth call. Is something like that possible?

Once again, thank you.

You’re welcome. I didn’t find how to do it, maybe someone else has the answer

Hi there,

similar to line 24 from this example, would using set_active_camera work for your use case?

Cheers,
Andrei

Hello,

Thank you for your time. I’ve seen that function, but in debug mode my batch of images data is the same across first dimension. What I mean is that I get same image for each viewport.

My concern is refresh of the viewport, when does it update.

I had no time to test the code today, but I will do it tomorrow or day after that to finally see if it’s possible.

And has anyone experienced a problem where things that are close to camera are not shown in the viewport? Like, when camera is near the wall, and facing it, camera only sees behind the wall, and not the wall itself. It’s like everything that is in range of like 0.5m in front of the camera is cliped. Does it have anything to do with the “sphere” that represents the user or whatever, that’a moving trough the scene?

Not seeing things close to the camera is the near clipping range. In the above example it would be: "clippingRange": (self.__near, self.__far) you can put self.__near to 0 or something very small.

Thank you for your efforts.

I was able to test the code with multiple cameras and single viewport, sadly it’s not working. My guess is that image buffer in GPU memory is updated once per step and multiple viewports need to be open at the same time. Which makes sense, they are allocating memory and just update it each step, since images are big chunk of data.

But problem arises when you want to do the Reinforcement Learning using visual servoing, or any task that requires multiple cameras at the scene while training neural network, since GPU memory is already allocated it would be impossible to allocate another neural network with convolutional layers, and even on top of that you have to transfer minibatch to the GPU, also.

I would like to see a solution, or partial solution, on this problem from devs. Otherwise we would require really expensive GPUs for those type of tasks.