TensorRT inference in Isaac SDK does not give accurate inference using onnx file

HI,

I used COCO dataset with 91 classes with additional class to train model using pyTorch. Then I converted the pyTorch model into an onnx file and run inference with DetectNet jetson inference and it works in accurately detecting objects on detectnet. Then I used the same onnx file in the TensorRT inference that is in Isaac SDK and I don’t get the accurate inference and get a lot of false positives for objects, i.e. when there is nothing in the camera feed view it says it detected something. Can you please take a look at this json file for the TensorRT inference and let me know of your suggestions to improve the accuracy of the confidence and detection? If you have answers or questions for me that would be great.
{
“modules”: [
“detect_net”,
“ml”,
“perception”,
“sight”,
“viewers”
],
“graph”: {
“nodes”: [
{
“name”: “subgraph”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “interface”,
“type”: “isaac::alice::Subgraph”
}
]
},
{
“name”: “tensor_encoder”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.ml.ColorCameraEncoderCuda”,
“type”: “isaac::ml::ColorCameraEncoderCuda”
}
]
},
{
“name”: “tensor_r_t_inference”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.ml.TensorRTInference”,
“type”: “isaac::ml::TensorRTInference”
}
]
},
{
“name”: “detection_decoder”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.detect_net.DetectNetDecoder”,
“type”: “isaac::detect_net::DetectNetDecoder”
}
]
},
{
“name”: “detection_viewer”,
“components”: [
{
“name”: “isaac.alice.MessageLedger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.viewers.DetectionsViewer”,
“type”: “isaac::viewers::DetectionsViewer”
}
]
},
{
“name”: “color_camera_visualizer”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.viewers.ImageViewer”,
“type”: “isaac::viewers::ImageViewer”
}
]
},
{
“name”: “sight_widgets”,
“components”: [
{
“type”: “isaac::sight::SightWidget”,
“name”: “Detections”
}
]
}
],
“edges”: [
{
“source”: “subgraph/interface/image”,
“target”: “tensor_encoder/isaac.ml.ColorCameraEncoderCuda/rgb_image”
},
{
“source”: “tensor_encoder/isaac.ml.ColorCameraEncoderCuda/tensor”,
“target”: “tensor_r_t_inference/isaac.ml.TensorRTInference/image”
},
{
“source”: “tensor_r_t_inference/isaac.ml.TensorRTInference/bounding_boxes_tensor”,
“target”: “detection_decoder/isaac.detect_net.DetectNetDecoder/bounding_boxes_tensor”
},
{
“source”: “tensor_r_t_inference/isaac.ml.TensorRTInference/confidence_tensor”,
“target”: “detection_decoder/isaac.detect_net.DetectNetDecoder/confidence_tensor”
},
{
“source”: “detection_decoder/isaac.detect_net.DetectNetDecoder/detections”,
“target”: “detection_viewer/isaac.viewers.DetectionsViewer/detections”
},
{
“source”: “subgraph/interface/image”,
“target”: “color_camera_visualizer/isaac.viewers.ImageViewer/image”
},
{
“source”: “detection_decoder/isaac.detect_net.DetectNetDecoder/detections”,
“target”: “subgraph/interface/detections”
}
]
},
“config”: {
“color_camera_visualizer”: {
“isaac.viewers.ImageViewer”: {
“camera_name”: “camera”
}
},
“tensor_encoder”: {
“isaac.ml.ColorCameraEncoderCuda”: {
“rows”: 300,
“cols”: 300,
“pixel_normalization_mode”: “Unit”,
“tensor_index_order”: “201”
}
},
“tensor_r_t_inference”: {
“isaac.ml.TensorRTInference”: {
“model_file_path”: “/home/xavier/object_detection/coco_train/ssd-mobilenet.onnx”,
“engine_file_path”:"/home/xavier/object_detection/coco_train/ssd-mobilenet.plan",
“max_workspace_size”: 67108864,
“max_batch_size”: 32,
“inference_mode”: “float16”,
“force_engine_update”: false,
“input_tensor_info”: [
{
“operation_name”: “input_0”,
“channel”: “image”,
“dims”: [
3,
300,
300
],
“uff_input_order”: “channels_last”
}
],
“output_tensor_info”: [
{
“operation_name”: “boxes”,
“channel”: “bounding_boxes_tensor”,
“dims”: [
375,
32,
1
]
},
{
“operation_name”: “scores”,
“channel”: “confidence_tensor”,
“dims”: [
92,
3000,
1
]
}
]
}
},
“detection_decoder”: {
“isaac.detect_net.DetectNetDecoder”: {
“labels”: [
“BACKGROUND”,
“person”,
“bicycle”,
“car”,
“motorcycle”,
“airplane”,
“bus”,
“train”,
“truck”,
“boat”,
“traffic light”,
“fire hydrant”,
“street sign”,
“stop sign”,
“parking meter”,
“bench”,
“bird”,
“cat”,
“dog”,
“horse”,
“sheep”,
“cow”,
“elephant”,
“bear”,
“zebra”,
“giraffe”,
“hat”,
“backpack”,
“umbrella”,
“shoe”,
“eye glasses”,
“handbag”,
“tie”,
“suitcase”,
“frisbee”,
“skis”,
“snowboard”,
“sports ball”,
“kite”,
“baseball bat”,
“baseball glove”,
“skateboard”,
“surfboard”,
“tennis racket”,
“bottle”,
“plate”,
“wine glass”,
“cup”,
“fork”,
“knife”,
“spoon”,
“bowl”,
“banana”,
“apple”,
“sandwich”,
“orange”,
“broccoli”,
“carrot”,
“hot dog”,
“pizza”,
“donut”,
“cake”,
“chair”,
“couch”,
“potted plant”,
“bed”,
“mirror”,
“dining table”,
“window”,
“desk”,
“toilet”,
“door”,
“tv”,
“laptop”,
“mouse”,
“remote”,
“keyboard”,
“cell phone”,
“microwave”,
“oven”,
“toaster”,
“sink”,
“refrigerator”,
“blender”,
“book”,
“clock”,
“vase”,
“scissors”,
“teddy bear”,
“hair drier”,
“toothbrush”,
“hair brush”
],
“non_maximum_suppression_threshold”: 0.4,
“confidence_threshold”: 0.9,
“output_scale”: [720, 1280]
}
},
“sight_widgets”: {
“Detections”: {
“type”: “2d”,
“channels”: [
{ “name”: “$(fullname color_camera_visualizer/isaac.viewers.ImageViewer/image)” },
{ “name”: “$(fullname detection_viewer/isaac.viewers.DetectionsViewer/detections)” }
]
}
}
}
}

The Deepstream version of DetectNet decoder uses DBSCAN to perform post-processing filtering on spurious bounding boxes after inference. The Isaac SDK version uses “non-maximal suppression” for post-processing filtering which may not yield the same results. You’ll also want to check that the filtering configurations match as much as possible using parameters on DetectNetDecoder in Isaac SDK.

The Deepstream version of DetectNet decoder uses DBSCAN to perform post-processing filtering on spurious bounding boxes after inference. The Isaac SDK version uses “non-maximal suppression” for post-processing filtering which may not yield the same results. You’ll also want to check that the filtering configurations match as much as possible using parameters on DetectNetDecoder in Isaac SDK.