Maintain class IDs between separate generations

Currently I have 120 YAML scenarios which contian classes: person, person_fallen, scooter, palletjack, forklift. I run the Isaac container against them and in each result my labels json for tight bboxes has different IDs for person etc. This is terrible for running model tranings because we have to unravel this puzzle each time and the scripts are getting unnecessarily complicated. We should be able to tell what the IDs are or they should be AT LEAST alphabetical.

@pcallender @hclever @jiehanw @dennis.lynch

@pcallender this is a very serious problem for us

Hi @Turowicz . Unfortunately we don’t really have a great way to get consistent semantic id across runs. Work has been proposed to maintain consistent semantic ids based on some hashing of the semantic labels.
You can do a workaround by creating a custom writer, and providing a

custom_ids = {
    "person": 1,
    "person_fallen": 2,
    "scooter": 3,
    "palletjack": 4,
    "forklift": 5
}

And inside your writer, you can get the bbox idToLabels dict and modify it providing your custom mapping:

id_to_labels = bbox_2d_anno_data["idToLabels"]

For more info, you can always go to kitti.py , which it serves a good example of doing a custom mapping.

I use the BasicWriter instead of KittiWriter. I know it may be asking for a lot but perhaps you can post this code change for me?

So in basicwriter.py, under the function _write_bounding_box_data, you can access the id_to_labels data, which is a dictionary mapping from id to labels.

You can prepare your own custom mapping from id to labels like this:

custom_ids = {
    1: "person",
    2: "person_fallen",
    3: "scooter",
    4: "palletjack",
    5: "forklift"
}

And do something like

for custom_id, custom_label in custom_ids:
     if custom_id in id_to_labels and id_to_labels[custom_id] != custom_label:
        new_id = ... # assign a new id for the to be replaced label
        id_to_labels[new_id] = id_to_labels[custom_id]

        id_to_labels[custom_id] = custom_label

        # process bbox data
        for data in bbox_data:
            if data.semanticId = custom_id:
                data.semanticId = new_id

There might be some changes you have to make in order to run, but the idea is here.
Let me know if you encounter some problem!

Thanks I will give this a shot. Any idea how to register custom writer (.py) while using the YAML workflow?

I think currently no. You are suppose to do

WriterRegistry.register:
  writer: <writer_name>

But because writer arg needs to be a writer class, currently we can’t define a writer class in a yaml.

Is basicwriter.py editable in the Isaac container so I can just modify it myself?

Testing my change soon:

diff --git a/isaac-sim/extscache/omni.replicator.core-1.10.20+105.1.lx64.r.cp310/omni/replicator/core/scripts/writers_default/basicwriter.py b/.devcontainer/basicwriter.py
index 8e30163..9d941a3 100644
--- a/isaac-sim/extscache/omni.replicator.core-1.10.20+105.1.lx64.r.cp310/omni/replicator/core/scripts/writers_default/basicwriter.py
+++ b/.devcontainer/basicwriter.py
@@ -8,6 +8,7 @@ license agreement from NVIDIA CORPORATION is strictly prohibited.
 """
 
 from typing import List
+from typing import Dict
 
 import numpy as np
 from omni.syntheticdata.scripts.SyntheticData import SyntheticData
@@ -150,6 +151,7 @@ class BasicWriter(Writer):
         skeleton_data: bool = False,
         frame_padding: int = 4,
         semantic_filter_predicate: str = None,
+        custom_ids: Dict[int, str] = None,
     ):
         self._output_dir = output_dir
         if s3_bucket:
@@ -169,6 +171,7 @@ class BasicWriter(Writer):
         self._output_data_format = {}
         self.annotators = []
         self.version = __version__
+        self.custom_ids = custom_ids
         self._frame_padding = frame_padding
         self._telemetry = Schema_omni_replicator_extinfo_1_0()
 
     def write(self, data: dict):
@@ -313,6 +317,32 @@ class BasicWriter(Writer):
             self._frame_id = 0
             self._sequence_id = sequence_id
 
+        if self.custom_ids is not None:
+            bbox_data = data[annotator]["data"]
+            id_to_labels = data[annotator]["info"]["idToLabels"]
+
+            replaced_id_sequence = 10000
+
+            for custom_id, custom_label in self.custom_ids:
+                if custom_id in id_to_labels and id_to_labels[custom_id] != custom_label:
+                    replaced_id = replaced_id_sequence
+                    id_to_labels[replaced_id] = id_to_labels[custom_id]
+                    id_to_labels[custom_id] = custom_label
+
+                    for data in bbox_data:
+                        if data.semanticId == custom_id:
+                            data.semanticId = replaced_id
+                    
+                    replaced_id_sequence += 1
+
+            for replaced_id in range(10000, replaced_id_sequence):
+                for custom_id, custom_label in self.custom_ids:
+                    if id_to_labels[custom_id] == id_to_labels[replaced_id]:
+                        for data in bbox_data:
+                            if data.semanticId == replaced_id:
+                                data.semanticId = custom_id
+                                del id_to_labels[replaced_id]
+
         for annotator in data.keys():
             annotator_split = annotator.split("-")
             render_product_path = ""

The code above still needs work. There are different data structures to handle for each of the annotators that I need (bb tight, bb lose, instance segmentation).