Save camera position and rotation with Writer for offline dataset generation

Hello,
I would like to create a Writer that for each image register save the position and rotation of the camera in a numpy array and save the numpy array at the end.
I code this :

class ArenaWriter(Writer):
    def __init__(
        self,
        output_dir,
        rig,
        image_output_format="png",
        n_data=10,
        n_attr=5
    ):

        self._output_dir = output_dir
        self._backend = BackendDispatch({"paths": {"out_dir": output_dir}})
        self._frame_id = 0
        self._image_output_format = image_output_format
        self._rig=rig
        self._attributes = np.zeros(shape=(n_data, n_attr))

        self.annotators = [AnnotatorRegistry.get_annotator("rgb")]

    def write(self, data):

        im_array = np.array(data['rgb'])
        im_PIL = Image.fromarray(np.uint8(im_array))
        im_PIL = im_PIL.resize((128,64), Image.ANTIALIAS)
        im_array = np.array(im_PIL)
        filepath = f"rgb_{self._frame_id}.{self._image_output_format}"

        cam_pos = self._rig.get_world_pose()
        pos = cam_pos[0]
        rot = Rotation.from_quat(cam_pos[1])
        rot_euler = rot.as_euler('xyz', degrees=True) 

        self._attributes[self._frame_id,:2] = pos[:2]
        self._attributes[self._frame_id,2:] = cam_pos[1]
        self._backend.write_image(filepath, im_array)

        self._frame_id += 1

    def on_final_frame(self):
        np.save(self._output_dir + 'attributes.npy', self._attributes)


def main():

    # Open the environment in a new stage
    print(f"Loading Stage {ENV_URL}")
    open_stage(ENV_URL)

    stage = omni.usd.get_context().get_stage()

    # Create Replicator Camera
    cam = rep.create.camera(
        position=(0, 0, 0.25),
        rotation=(0, 0, 0),
        focal_length=28,
        fisheye_max_fov=110,
        clipping_range=(0.01, 20.)
    )

    cam_node = cam.node
    print(cam_node)
    cam_rig_path = rep.utils.get_node_targets(cam_node, "inputs:prims")[0]
    print(cam_rig_path)
    cam_path = str(cam_rig_path) + "/Camera"    
    print(cam_path)
    rig = XFormPrim(prim_path=cam_rig_path)

    rep.WriterRegistry.register(ArenaWriter)
    writer = rep.WriterRegistry.get("ArenaWriter")
    out_dir = "/root/Documents/arena/data/"
    writer.initialize(output_dir=out_dir, rig=rig, n_data=CONFIG["num_frames"])

    # Create a Replicator render for the Isaac Sim API Camera 
    RESOLUTION = (1600, 1300)
    camera_rp = rep.create.render_product(cam, RESOLUTION)

    # Attach the render to the Writer
    writer.attach([camera_rp])

    with rep.trigger.on_frame():
        with cam:
            rep.modify.pose(
                    position=rep.distribution.uniform((-1.8, -1.3, 0.25), (1.8, 1.3, 0.25)),
                    rotation=rep.distribution.uniform((0, 0, 0), (0, 0, 360))
            )


    for i in range(CONFIG["num_frames"]):
        rep.orchestrator.step()
    writer.on_final_frame()

But I think this is suboptimal in term of time and I prefered to use :

rep.orchestrator.run()

# Wait until started
while not rep.orchestrator.get_is_started():
    simulation_app.update()

# Wait until stopped
while rep.orchestrator.get_is_started():
    simulation_app.update()

rep.BackendDispatch.wait_until_done()
rep.orchestrator.stop()

But with this I have some trouble to save camera position and rotation because there is a mismatch between camera parameters and the image saved.

Hi @Leopold_M - Someone from our team will review and respond back.

Hi there,

AFAIK, manually using the step() function should not cause any significant overhead.

The issue with the rig pose being wrong could be caused due to an off-by-one frame between the annotators data and the stage. I will look into this and come back to you.

Best,
Andrei

Hello,

Thanks a lot for your answer. Indeed when I use the step function it works well, but seems to take more time than rep.orchestrator.run()

I see, I expected due to the large resolution, the AOV processing to take most of the time, and thus the extra processing frame not being noticeable anymore.

For now, you could also try adding multiple cameras to parallelize data processing and possible shortening the relative overhead.

Future releases should not have this issue anymore.

I tried again, and I think last was chance because in general I don’t have any correspondence between the image and the position of the camera.
I tried to addrep.BackendDispatch.wait_until_done() in the loop, because I thought it will force to wait until all the backend saved are done. But it didn’t solve the problem.
Also I found that the images saved after the two first step are always the same and I don’t know why

Backend dispatch should not influence the writing workflow, it caches the data and writes it to file, once the cache is full it slows down the workflow in order to catch up with writing data to disk.

Are you getting the off-by-one frame issue using step() as well? Can you check using an ordered sequence:

    with rep.trigger.on_frame():
        with cam:
            rep.modify.pose(position=rep.distribution.sequence([(0,0,0), (0,0,1), (0,0,2)]))

Thank you for the advice. I tried with rep.distribution.sequence instead of rep.distribution.uniform.
The attributes saved are correct but the images are not. I noticed the image associated with the first attributes is saved two times. This shifts all the images compared to attributes like this :
im0 im0 im1 im2 …
att0 att1 att2 att3 …
But I have no idea where it comes from because I only use rep.orchestrator.step()

Hello,

Have you looked at the camera_params annotator?

Annotators Information — Omniverse Extensions documentation (nvidia.com)

It will give you per-frame data from the rendered camera, and includes a cameraViewTransform parameter that you can derive the position and rotation from.

Hello,

I tried to use cameraViewTransform from CameraParams annotator with a simpler example :

class PrintWriter(Writer):
    def __init__(
        self,
    ):

        self.annotators = [AnnotatorRegistry.get_annotator("CameraParams")]

    def write(self, data):
        print(data["CameraParams"]["cameraViewTransform"])


def main():

    camera = rep.create.camera()

    render_product = rep.create.render_product(camera, (1024, 1024))

    rep.WriterRegistry.register(PrintWriter)
    writer = rep.WriterRegistry.get("PrintWriter")
    out_dir = "/root/Documents/arena/data/"
    writer.initialize()

    # Attach the render to the Writer
    writer.attach([render_product])

    poses = [(0, 0, 0), (1., 0, 0), (0, 1., 0), (0, 0, 1.)]

    with rep.trigger.on_frame():
        with camera:
            rep.modify.pose(
                    position = rep.distribution.sequence(poses)
            )

    for i in range(len(poses)):
        rep.orchestrator.step()

The printed outputs are :

[ 2.22044605e-16 -2.22044605e-16  1.00000000e+00  0.00000000e+00
  1.00000000e+00  4.93038066e-32 -2.22044605e-16  0.00000000e+00
  0.00000000e+00  1.00000000e+00  2.22044605e-16 -0.00000000e+00
 -0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]

[ 2.22044605e-16 -2.22044605e-16  1.00000000e+00  0.00000000e+00
  1.00000000e+00  4.93038066e-32 -2.22044605e-16  0.00000000e+00
  0.00000000e+00  1.00000000e+00  2.22044605e-16 -0.00000000e+00
 -0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]

[ 2.22044605e-16 -2.22044605e-16  1.00000000e+00  0.00000000e+00
  1.00000000e+00  4.93038066e-32 -2.22044605e-16  0.00000000e+00
  0.00000000e+00  1.00000000e+00  2.22044605e-16 -0.00000000e+00
 -2.22044605e-16  2.22044605e-16 -1.00000000e+00  1.00000000e+00]

[ 2.22044605e-16 -2.22044605e-16  1.00000000e+00  0.00000000e+00
  1.00000000e+00  4.93038066e-32 -2.22044605e-16  0.00000000e+00
  0.00000000e+00  1.00000000e+00  2.22044605e-16 -0.00000000e+00
 -1.00000000e+00 -4.93038066e-32  2.22044605e-16  1.00000000e+00]

The two first cameraViewTransform are still the same, and the matrices seems a bit weird to me. I was expecting something more in the form :

[ 1.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00
  0.00000000e+00  1.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  1.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]

for example

Would the inverse provide the world transform:

cam_world_to_local = cameraViewTransform.reshape(4, 4)
cam_local_to_world = np.linalg.inv(cam_world_to_local)

Even the inverse didn’t seems to provide the world transform :

[[ 2.22044605e-16  1.00000000e+00  0.00000000e+00  0.00000000e+00]
 [-2.22044605e-16  4.93038066e-32  1.00000000e+00  0.00000000e+00]
 [ 1.00000000e+00 -2.22044605e-16  2.22044605e-16  0.00000000e+00]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]

[[ 2.22044605e-16  1.00000000e+00  0.00000000e+00  0.00000000e+00]
 [-2.22044605e-16  4.93038066e-32  1.00000000e+00  0.00000000e+00]
 [ 1.00000000e+00 -2.22044605e-16  2.22044605e-16  0.00000000e+00]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]

[[ 2.22044605e-16  1.00000000e+00  0.00000000e+00  0.00000000e+00]
 [-2.22044605e-16  4.93038066e-32  1.00000000e+00  0.00000000e+00]
 [ 1.00000000e+00 -2.22044605e-16  2.22044605e-16  0.00000000e+00]
 [ 1.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]

[[ 2.22044605e-16  1.00000000e+00  0.00000000e+00  0.00000000e+00]
 [-2.22044605e-16  4.93038066e-32  1.00000000e+00  0.00000000e+00]
 [ 1.00000000e+00 -2.22044605e-16  2.22044605e-16  0.00000000e+00]
 [ 0.00000000e+00  1.00000000e+00  0.00000000e+00  1.00000000e+00]]

The poses that are supposed to be used are : poses = [(0, 0, 0), (1., 0, 0), (0, 1., 0), (0, 0, 1.)]. And I don’t ave these results in the matrices above

Until a solution is found, or until this is fixed in the new release, I would suggest to access the data directly from the annotator:

this should give you more control on accessing the data in stage as well.

Thank you for your help. I will access the data through the annotator for the moment.
It is fine for the camera, but if I randomize the position of another object in the scene (a cube for example) and want to save its placement at each frame of the replicator at the same time than the captured image, is it still possible ?

Yes, with the annotator you would call either world.step() or orchestrator.step() (not both) to feed the annotator with new data. Before, or after, calling step() you have full control on the simulation or reading/modifying the stage.

Let me know if this does not work for your specific scenario.

UPDATE manual world.render() calls might be required to sync the render data with the simulation stage: Problem with images I get from cameras - #3 by alempereur

Hi @Leopold_M

From my experience, if you move something in IsaacSim you need to render twice to see the effects.

As for your question, I used the synthetic data helper to save those data.
I just partially updated my repo to the 2022 version.

In practice, I’ve created an extension that you can load and will save all the data from your viewport in the correct format. I also solved some bugs that affect the synthetic data helper.

In my experience, the matrices are transposed and scaled (thus you take the last row, multiply by the meters_per_unit factor, and transpose).

Here is the code for the custom extension GRADE-RR/extension_custom.py at v2022 · eliabntt/GRADE-RR · GitHub, here is where I set up the recorder in the code GRADE-RR/paper_simulation.py at v2022 · eliabntt/GRADE-RR · GitHub. Then with GRADE-RR/paper_simulation.py at v2022 · eliabntt/GRADE-RR · GitHub (my_recorder.counter += 1) you increase the index of the image and with .update() you can record the image and all the data. This will work for all the viewports (unless you change the code) and for most of the data that you might want (optical flow is still tricky).

The pose of the objects and of the camera are saved using poses and camera here. Camera pose will be saved with this snipped (edited to get the correct vfov) and the pose of the objects will be obtained here (edited to get the correct pose irrespective of the kind of animations that you use, some are not reported by the default method)

In case you need, I can support you there.