KITTI Writer Support

Hey! I want to use the Replicator SDG pipeline to generator training data for a 3D bounding box estimation network. My network requires KITTI formatted data, but the default KittiWriter class does not support some very crucial data such as: alpha, dimensions, location, rotation_y.

It seems that this data is available and could be included but is just set to 0. Is supporting these values on the Nvidia roadmap? If not, what would the best work around be? I’m currently planning on using Annotators to grab the data and a custom Writer class to write it.

1 Like

Hey @ckisailus, you’ve already probably looked at this but I thought I’d just surface the source/docstring for this (app/extscache/omni.replicator.core-1.7.7+104.2.lx64.r.cp37/omni/replicator/core/scripts/writers_default/

line 77:

class KittiWriter(Writer):
    """Writer outputting data in the KITTI annotation format
        Development work to provide full support is ongoing.

    Supported Annotations:
        Object Detection (partial 2D support, see notes)
        Semantic Segmentation
        Instance Segmentation

        Object Detection
        Bounding boxes with a height smaller than 25 pixels are discarded

        Supported: bounding box extents, semantic labels
        Partial Support: occluded (occlusion is estimated from the area ratio of tight / loose bounding boxes)
        Unsupported: alpha, dimensions, location, rotation_y, truncated (all set to default values of 0.0)

You could try extending that KittiWritter class and taking the alpha as an input. I’d expect that it might get messed up in the backend but worth a try.

Thanks! I ended up writing my own Writer which looks like this:

class FullKittiWriter(Writer):
    def __init__(self, output_dir, rotation_y, camera_position):
        self._output_dir = output_dir
        self._rotation_y = rotation_y * (np.pi / 180) # convert to rad
        if(self._rotation_y > np.pi):
            self._rotation_y -= 2*np.pi
        self._camera_pos = camera_position
        self._pallet_dimensions = [1.210432, 1.00245652, .20864567] # length, width, height in m
        # I think this is right as of now
        self._world_to_camT = np.array([
           [1, 0, 0, camera_position[0]],
           [0, 1, 0, camera_position[1]],
           [0, 0, 1, camera_position[2]],
           [0, 0, 0, 1]
        self._backend = BackendDispatch({"paths": {"out_dir": output_dir}})
        self._frame_id = 0
        self._image_output_format = "png"
        self.annotators = [
            AnnotatorRegistry.get_annotator("rgb"), # RGB image
            AnnotatorRegistry.get_annotator("bounding_box_2d_tight"), # 2D bounding box
            AnnotatorRegistry.get_annotator("bounding_box_3d"), # 3D bounding box
            AnnotatorRegistry.get_annotator("CameraParams") # camera parameters for projection matrix
    def calculate_theta_ray(self, img_width, box_2d, proj_matrix):
            img_width (int): width of input image
            box_2d (list[tuple(int, int)]): extents of bounding box in [(xmin, ymin), (xmax, ymax)]
            proj_matrix: camera projection matrix
        fovx = 2 * np.arctan(img_width / (2*proj_matrix[0][0]))
        center = (box_2d[0][0] + box_2d[1][0]) / 2
        dx = center - (img_width / 2)

        mult = 1
        if dx < 0:
            mult = -1
        dx = abs(dx)
        angle = np.arctan((2*dx*np.tan(fovx/2)) / img_width)
        angle *= mult
        return angle
    def write(self, data):
        bbox3D_data = data["bounding_box_3d"]["data"][0]
        xmin = bbox3D_data[1]
        ymin = bbox3D_data[2]
        zmin = bbox3D_data[3]
        xmax = bbox3D_data[4]
        ymax = bbox3D_data[5]
        zmax = bbox3D_data[6]
        local_to_world_xform = bbox3D_data[7].reshape(4,4).swapaxes(0,1)
        box_center_local = np.array([0, 0, 0, 1])
        box_center_world = local_to_world_xform@box_center_local # store this to compute box location in camera coords
        # Get dimensions
        length = xmax - xmin
        width = ymax - ymin
        height = zmax - zmin
        dimensions = [float(length/100), float(width/100), float(height/100)]

        img_width = data['CameraParams']['renderProductResolution'][0]
        xmin2d = data['bounding_box_2d_tight']['data'][0][1]
        xmax2d = data['bounding_box_2d_tight']['data'][0][3]
        ymin2d = data['bounding_box_2d_tight']['data'][0][2]
        ymax2d = data['bounding_box_2d_tight']['data'][0][4]
        box_2d = [(xmin2d, ymin2d), (xmax2d, ymax2d)]
        proj_matrix = data['CameraParams']['cameraProjection'].reshape(4,4).swapaxes(0,1)
        # print(proj_matrix)
        # return
        viewXform = data['CameraParams']['cameraViewTransform']
        world_to_cam = viewXform.reshape(4,4).swapaxes(0,1)
        box_center_camera_frame_cm = world_to_cam@box_center_world
        box_center_camera_frame_m = box_center_camera_frame_cm / 100
        # print(proj_matrix)
        # return
        # Calculate ray angle and local orientation from paper description
        theta_r = self.calculate_theta_ray(img_width=img_width, box_2d=box_2d, proj_matrix=proj_matrix.reshape((4,4)) )
        theta_l = self._rotation_y - theta_r

        # Reform box for writing
        # [left top right bottom]
        bbox_data = [int(xmin2d), int(ymin2d), int(xmax2d), int(ymax2d)]
        # try:
        #     assert -np.pi <= theta_l <= np.pi
        # except AssertionError:
        #     print(f"Rotation_y = {self._rotation_y}")
        #     print(f"theta_r = {theta_r}")
        #     print( f"Expected alpha to be in [-pi, pi], but got alpha = {theta_l}")
        while theta_l > np.pi:
            theta_l -= 2*np.pi
        while theta_l < -np.pi:
            theta_l += 2*np.pi
        kitti_data = {
            "type": "Pallet",
            "truncated": 0.0, # assume not truncated ever
            "occluded": 0, #assume fully visible always
            "alpha": theta_l,
            "bbox": bbox_data,
            "dimensions": dimensions,
            "location": box_center_camera_frame_m.tolist(),
            "rotation_y": self._rotation_y,
            "camera_projection_mat": proj_matrix.tolist(),
            "world_to_cam_xform": world_to_cam.tolist()
        kitti_file_path = f"rot180_{self._frame_id}.json"
        buf = io.BytesIO()
        self._backend.write_blob(kitti_file_path, buf.getvalue())
        image_file_path = f"rot180_{self._frame_id}.{self._image_output_format}"
        self._backend.write_image(image_file_path, data['rgb'])
        self._frame_id += 1

There’s definitely some hard-coded values here that work for my case, but are not general. Hopefully this can spur some development from the Nvidia side!


Nice! I thought the exclusion of alpha had to do with the backend, have you observed the alpha explicitly working?

@ckisailus Great job making your own writer! You are correct there is some data that was omitted from the default Kitti writer. I’ll make a ticket and we’ll address that in a future update. Thanks for bringing this up.

Thanks, @pcallender! Looking forward to using Replicator more, it’s been great so far.