Object detection model accuracy drops on converting to engine file

I have object detection models trained on detectNet + resnet using synthetic data alone. The model is able to detect object in the real world images as well with high accuracy. When I use this model with the pose inference, the .etlt file is converted to a .plan engine file and the accuracy is reduced by a great amount. The object is no longer detected in almost any of the frames by the object detection module. Is there any way to combat this?

Hi @rkjha1,

The .etlt model is converted to a .plan engine file any time you do inference with TensorRT.

Please share the steps you are using for inference (1) when you are able to see detections with an ETLT model on real data, and (2) use the same ETLT model for pose estimation. Mainly, how are you configuring the pose estimation inference to use the model?

I run the inference on my tlt model using tlt-infer detectnet_v2 -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt \ -o $USER_EXPERIMENT_DIR/tlt_infer_testing \ -i $DATA_DOWNLOAD_DIR/testing/image_2 \ -k $KEY

I convert the tlt model to etlt model using tlt-export detectnet_v2 \ -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \ -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \ -k $KEY

I run the pose inference using bazel run packages/object_pose_estimation/apps/pose_cnn_decoder:detection_pose_estimation_cnn_inference_app -- --mode image --config packages/object_pose_estimation/apps/pose_cnn_decoder/detection_pose_estimation_cnn_inference_fan.config.json --image_directory /home/siminsights/tlt-experiments/unity3d_kitti_dataset_duplo/testing/image_2/010003.png --rows 368 --cols 640 --optical_center_rows 184 --optical_center_cols 320 --focal_length 50 --detection_model external/fan_pose_estimation_models/resnet18_detector.etlt --pose_estimation_model external/fan_pose_estimation_models/pose_cnn_model.uff --etlt_password 1234

The config json file is as shown below:

{
  "config": {
    "detection_pose_estimation.object_detection.tensor_r_t_inference": {
      "isaac.ml.TensorRTInference": {
        "model_file_path": "external/fan_pose_estimation_models/resnet18_detector_green.etlt",
        "etlt_password": "1234",
        "force_engine_update": false
      }
    },
    "detection_pose_estimation.object_detection.detection_decoder": {
      "isaac.detect_net.DetectNetDecoder": {
        "labels": ["green"],
        "non_maximum_suppression_threshold": 0.3,
        "confidence_threshold": 0.55
      }
    },
    "detection_pose_estimation.object_pose_estimation.pose_encoder": {
      "TensorRTInference": {
        "model_file_path": "external/fan_pose_estimation_models/pose_cnn_model.uff",
        "force_engine_update": false
      }
    },
    "detection_pose_estimation.object_pose_estimation.detection_convertor": {
      "BoundingBoxEncoder": {
        "class_names": ["green"]
      }
    },
    "detection_pose_estimation.viewers": {
      "Detections3Viewer": {
        "frame": "camera",
        "object_T_box_center": [1, 0, 0, 0, 0, 0, 0],
        "box_dimensions": [6.0, 3.0, 1.8],
        "mesh_name": "green"
      }
    },
    "detection_pose_estimation.object_pose_estimation.detection_filter": {
      "FilterDetectionsByLabel": {
        "whitelist_labels": [
          "green"
        ]
      }
    },
    "websight": {
      "WebsightServer": {
        "webroot": "packages/sight/webroot",
        "assetroot": "external/fan_pose_estimation_models",
        "port": 3000,
        "ui_config": {
          "assets": {
            "box": {
              "obj": "DuploBlue-Center.obj",
              "rotation": [
                0.70710678118,
                0.70710678118,
                0,
                0
              ],
              "translation": [
                0,
                0,
                0.0
              ],
              "scale": 1.0
            }
          }
        }
      }
    }
  }
}

Hi @rkjha1,

You can run TensorRT inference on the same images in $DATA_DOWNLOAD_DIR/testing/image_2 by using the detect_net_inference app in Isaac:

bazel run packages/detect_net/apps:detect_net_inference_app -- --mode image --image_directory $DATA_DOWNLOAD_DIR/testing/image_2 ...

This way, you can directly compare the results of tlt-infer with inference results in Isaac. Keep in mind that the post-processing of the bounding boxes differs between TLT and Isaac (read more about TLT’s inference here: https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/#spec_file_infer). You can adjust the post-processing in Isaac (non-max suppresion threshold and confidence threshold) in the configuration file supplied to detect_net_inference_app, or directly using these parameters: --confidence_threshold and --nms_threshold. I would suggest tuning these to achieve better performance.