TLT trained model accuracy worse after deployment

  • Hardware Platform:
    Jetson Nano
  • Deepstream Version:
    5.0.1
  • JetPack Version:
    4.4.1
  • TensorRT Version:
    7.1.3
  • TLT Version:
    1.0.1

I am using TLT to train a detection model (DetectNetV2 ResNet18) on a custom object class, that is then deployed within a Deepstream app based on the python examples.

I noticed that when I train the TLT model and deploy it on the nano in the Deepstream app, I see a considerable drop in detection accuracy. Below is what I see and the options I have explored:

  1. TLT model converted to TRT engine file, and then run on the nano in the DS app:
  • The TLT model shows a mAP of 90% after training, but when deployed as a TRT engine file on the nano the performance drops. Samples that were correctly inferred in the TLT model are no longer inferred correctly when I used the converted TRT engine file.
  • What could be causing this? Do I have to change any specific configurations in the Deepstream app / pgie files to ensure the accuracy is maintained from TLT to TRT?
  • I’m currently using a pre-cluster-threshold of 0.2 in my pgie configuration file
  1. TLT model run as ETLT model on the nano:
  • The TLT model when run directly as an ETLT model on the nano did not make any detections at all with the same threshold configuration as above. Is there a specific way to implement this?

If not either of these options, is there something else I should try to ensure accuracy is maintained when running a TLT model in a DS app?

1 Like

Above does not make sense. Can you attach your config file of deepstream?

More, how did you run inference in deepstream? Can you share the full command and full log?

Sorry, what I meant to say is that when I take the model trained with TLT and deploy on the Jetson Nano (without conversion to a .trt file), there seems to be almost no objects detected (in the test video I am using). If I first convert the model to a .trt file, then there are detections but significantly less when compared to the same test video samples inferred with the model using TLT itself.

This is my pgie config setup:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
labelfile-path=…/…/…/…/samples/models/Primary_Detector/labels.txt
model-engine-file=…/…/…/…/samples/models/Primary_Detector/model_fp16.trt
force-implicit-batch-dim=1
batch-size=1
process-mode=1
model-color-format=0
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.2
group-threshold=2

(when using the .etlt file I just changed model-engine-file=…/…/…/…/samples/models/Primary_Detector/model.etlt)

I run inference in deepstream via a custom python script that I based off the “deepstream_python_apps/apps/deepstream_test1/” python example code.

Several comments:

You mentioned you are using TLT 1.0.1 at deepstream 5.0.1. Strongly suggest you to use TLT 3.0_dp instead.

This is not correct. Please see DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation . If you deploy etlt file in the config file, please set

tlt-encoded-model=<Path to DetectNet_v2 TLT model>
tlt-model-key=<Key to decrypt the model>

Hi @Morganh thanks for those comments. I have followed the documentation for using the .etlt file, currently using the following config:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
labelfile-path=…/…/…/…/samples/models/Primary_Detector/labels.txt

tlt-encoded-model=…/…/…/…/samples/models/Primary_Detector/detectnet_v5.etlt
tlt-model-key=API_KEY
infer-dims=3;368;640
uff-input-blob-name=input_1
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd

force-implicit-batch-dim=1
batch-size=1
process-mode=1
model-color-format=0
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1

[class-attrs-all]
#pre-cluster-threshold=0.2
threshold=0.2
eps=0.2
group-threshold=1

I can now get detections from using the .etlt model file. The detection accuracy is the same as the .trt file though (which is worse than the inferences I get from using TLT). I also noticed that changing the “threshold” configuration above seems to have no affect. Is there another threshold configuration option I’m missing?

I am currently in the process of retraining the model using TLT 3.0.

Sorry for late reply. For reference, I suggest you running a detetect_v2 sample which provides at /opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models

Refer to README file. Run with $ deepstream-app -c <deepstream_app_config> .
For example, to run peoplenet(it is based on detectnet_v2 network)
$ deepstream-app -c deepstream_app_source1_peoplenet.txt
Inside deepstream_app_source1_peoplenet.txt, there is config_infer_primary_peoplenet.txt.

BTW, you can set pre-cluster-threshold to a very low value to check if there are more bboxes.

After training the same dataset (with the same detectnet/resnet18 network and training configuration) on TLT3.0, I have found that training accuracy (AP) plateaus at 33%. This is much less than what I was getting when I used TLT1.0 (~90% AP), however when I deployed this new model on the Jetson Nano and tested with the same test video I’ve been using previously, the detection performance looks unchanged.

Does this mean the reported AP value from TLT1.0 was not correct?

Are there different methods used to calculate AP between the two versions that could result in this difference in reported AP?

Accuracy on TLT3.0 training is reaching a plateau after only 10-20 epochs. I have a dataset that contains at least 50,000 examples of the one object class I’m training for. Do you have any suggestions for things I could try, in order to improve the training accuracy?

Thanks for your help!

1 Like

For your reference, I have used the following config for training a detectnet model with a dataset of 50,000+ examples of the object class:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/detectnet_v2/tfrecords/trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "head"
    value: "head"
  }
  
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 640
    output_image_height: 368
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "head"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
  
}
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "head"
    value: 0.5
  }
  
  evaluation_box_config {
    key: "head"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "head"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 120
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "head"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  
  deadzone_radius: 0.400000154972
}

Which of the above settings could I look at changing/tweaking to try and improve the accuracy?

Above description does not make sense. Did you have any log for both training?

More, could you run tlt-infer(in TLT1.0.1) and tlt detectnet_v2 inference(TLT 3.0) to run inference against some test images?

Hi @Morganh I work with @hj_es - we’ll run the test as you suggested, and get back to you with the results.

However, our key question is, why is the AP so low (33%) after we trained a custom object detection model using TLT3.0 with nearly 50,000+ samples?

Do we need to change anything in the training config file above? Or is it something else that we are overlooking?

Firstly, please check if your labels are correct.
Second, please resize your images/labels to 640x368 offline. See https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/open_model_architectures.html#detectnet-v2

The train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

Then, please check average width/height of the class “head”. Is it too small? If yes, please see Frequently Asked Questions — Transfer Learning Toolkit 3.0 documentation

In DetectNet_V2, are there any parameters that can help improve AP (average precision) on training small objects?

Following parameters can help you improve AP on smaller objects:

  • Increase num_layers of resnet
  • class_weight for small objects
  • Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class
  • Decrease minimum_detection_ground_truth_overlap
  • Lower minimum_height to cover more small objects for evaluation.