TensorRT inference is not accurate and has false positives using onnx file converted from pyTorch model trained on COCO dataset

HI,

I used COCO dataset with 91 classes with additional class to train model using pyTorch. Then I converted the pyTorch model into an onnx file and run inference with DetectNet and it works in accurately detecting objects on detectnet. Then I used the same onnx file in the TensorRT inference that is in Isaac SDK and I don’t get the accurate inference and get a lot of false positives for objects, i.e. when there is nothing in the camera feed view it says it detected something. Can you please take a look at this json file for the TensorRT inference and let me know of your suggestions to improve the accuracy of the confidence and detection? If you have answers or questions for me that would be great.
{
“modules”: [
“detect_net”,
“ml”,
“perception”,
“sight”,
“viewers”
],
“graph”: {
“nodes”: [
{
“name”: “subgraph”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “interface”,
“type”: “isaac::alice::Subgraph”
}
]
},
{
“name”: “tensor_encoder”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.ml.ColorCameraEncoderCuda”,
“type”: “isaac::ml::ColorCameraEncoderCuda”
}
]
},
{
“name”: “tensor_r_t_inference”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.ml.TensorRTInference”,
“type”: “isaac::ml::TensorRTInference”
}
]
},
{
“name”: “detection_decoder”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.detect_net.DetectNetDecoder”,
“type”: “isaac::detect_net::DetectNetDecoder”
}
]
},
{
“name”: “detection_viewer”,
“components”: [
{
“name”: “isaac.alice.MessageLedger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.viewers.DetectionsViewer”,
“type”: “isaac::viewers::DetectionsViewer”
}
]
},
{
“name”: “color_camera_visualizer”,
“components”: [
{
“name”: “message_ledger”,
“type”: “isaac::alice::MessageLedger”
},
{
“name”: “isaac.viewers.ImageViewer”,
“type”: “isaac::viewers::ImageViewer”
}
]
},
{
“name”: “sight_widgets”,
“components”: [
{
“type”: “isaac::sight::SightWidget”,
“name”: “Detections”
}
]
}
],
“edges”: [
{
“source”: “subgraph/interface/image”,
“target”: “tensor_encoder/isaac.ml.ColorCameraEncoderCuda/rgb_image”
},
{
“source”: “tensor_encoder/isaac.ml.ColorCameraEncoderCuda/tensor”,
“target”: “tensor_r_t_inference/isaac.ml.TensorRTInference/image”
},
{
“source”: “tensor_r_t_inference/isaac.ml.TensorRTInference/bounding_boxes_tensor”,
“target”: “detection_decoder/isaac.detect_net.DetectNetDecoder/bounding_boxes_tensor”
},
{
“source”: “tensor_r_t_inference/isaac.ml.TensorRTInference/confidence_tensor”,
“target”: “detection_decoder/isaac.detect_net.DetectNetDecoder/confidence_tensor”
},
{
“source”: “detection_decoder/isaac.detect_net.DetectNetDecoder/detections”,
“target”: “detection_viewer/isaac.viewers.DetectionsViewer/detections”
},
{
“source”: “subgraph/interface/image”,
“target”: “color_camera_visualizer/isaac.viewers.ImageViewer/image”
},
{
“source”: “detection_decoder/isaac.detect_net.DetectNetDecoder/detections”,
“target”: “subgraph/interface/detections”
}
]
},
“config”: {
“color_camera_visualizer”: {
“isaac.viewers.ImageViewer”: {
“camera_name”: “camera”
}
},
“tensor_encoder”: {
“isaac.ml.ColorCameraEncoderCuda”: {
“rows”: 300,
“cols”: 300,
“pixel_normalization_mode”: “Unit”,
“tensor_index_order”: “201”
}
},
“tensor_r_t_inference”: {
“isaac.ml.TensorRTInference”: {
“model_file_path”: “/home/xavier/object_detection/coco_train/ssd-mobilenet.onnx”,
“engine_file_path”:“/home/xavier/object_detection/coco_train/ssd-mobilenet.plan”,
“max_workspace_size”: 67108864,
“max_batch_size”: 32,
“inference_mode”: “float16”,
“force_engine_update”: false,
“input_tensor_info”: [
{
“operation_name”: “input_0”,
“channel”: “image”,
“dims”: [
3,
300,
300
],
“uff_input_order”: “channels_last”
}
],
“output_tensor_info”: [
{
“operation_name”: “boxes”,
“channel”: “bounding_boxes_tensor”,
“dims”: [
375,
32,
1
]
},
{
“operation_name”: “scores”,
“channel”: “confidence_tensor”,
“dims”: [
92,
3000,
1
]
}
]
}
},
“detection_decoder”: {
“isaac.detect_net.DetectNetDecoder”: {
“labels”: [
“BACKGROUND”,
“person”,
“bicycle”,
“car”,
“motorcycle”,
“airplane”,
“bus”,
“train”,
“truck”,
“boat”,
“traffic light”,
“fire hydrant”,
“street sign”,
“stop sign”,
“parking meter”,
“bench”,
“bird”,
“cat”,
“dog”,
“horse”,
“sheep”,
“cow”,
“elephant”,
“bear”,
“zebra”,
“giraffe”,
“hat”,
“backpack”,
“umbrella”,
“shoe”,
“eye glasses”,
“handbag”,
“tie”,
“suitcase”,
“frisbee”,
“skis”,
“snowboard”,
“sports ball”,
“kite”,
“baseball bat”,
“baseball glove”,
“skateboard”,
“surfboard”,
“tennis racket”,
“bottle”,
“plate”,
“wine glass”,
“cup”,
“fork”,
“knife”,
“spoon”,
“bowl”,
“banana”,
“apple”,
“sandwich”,
“orange”,
“broccoli”,
“carrot”,
“hot dog”,
“pizza”,
“donut”,
“cake”,
“chair”,
“couch”,
“potted plant”,
“bed”,
“mirror”,
“dining table”,
“window”,
“desk”,
“toilet”,
“door”,
“tv”,
“laptop”,
“mouse”,
“remote”,
“keyboard”,
“cell phone”,
“microwave”,
“oven”,
“toaster”,
“sink”,
“refrigerator”,
“blender”,
“book”,
“clock”,
“vase”,
“scissors”,
“teddy bear”,
“hair drier”,
“toothbrush”,
“hair brush”
],
“non_maximum_suppression_threshold”: 0.4,
“confidence_threshold”: 0.9,
“output_scale”: [720, 1280]
}
},
“sight_widgets”: {
“Detections”: {
“type”: “2d”,
“channels”: [
{ “name”: “$(fullname color_camera_visualizer/isaac.viewers.ImageViewer/image)” },
{ “name”: “$(fullname detection_viewer/isaac.viewers.DetectionsViewer/detections)” }
]
}
}
}
}

Thanks!

The problem with accuracy was due to isaac.detect_net.DetectNetDecoder and output tensor dimensions. I was able to write a Codelet to get the TensorProto from TensorRT inference module in ISAAC and have accurate inference. The TensorRT was version 7.1.3.