TLT trained model accuracy worse after deployment

hj_es · April 20, 2021, 2:00pm

Hardware Platform:
Jetson Nano
Deepstream Version:
5.0.1
JetPack Version:
4.4.1
TensorRT Version:
7.1.3
TLT Version:
1.0.1

I am using TLT to train a detection model (DetectNetV2 ResNet18) on a custom object class, that is then deployed within a Deepstream app based on the python examples.

I noticed that when I train the TLT model and deploy it on the nano in the Deepstream app, I see a considerable drop in detection accuracy. Below is what I see and the options I have explored:

TLT model converted to TRT engine file, and then run on the nano in the DS app:

The TLT model shows a mAP of 90% after training, but when deployed as a TRT engine file on the nano the performance drops. Samples that were correctly inferred in the TLT model are no longer inferred correctly when I used the converted TRT engine file.
What could be causing this? Do I have to change any specific configurations in the Deepstream app / pgie files to ensure the accuracy is maintained from TLT to TRT?
I’m currently using a pre-cluster-threshold of 0.2 in my pgie configuration file

TLT model run as ETLT model on the nano:

The TLT model when run directly as an ETLT model on the nano did not make any detections at all with the same threshold configuration as above. Is there a specific way to implement this?

If not either of these options, is there something else I should try to ensure accuracy is maintained when running a TLT model in a DS app?

Morganh · April 21, 2021, 1:15am

Above does not make sense. Can you attach your config file of deepstream?

More, how did you run inference in deepstream? Can you share the full command and full log?

hj_es · April 21, 2021, 1:26pm

Sorry, what I meant to say is that when I take the model trained with TLT and deploy on the Jetson Nano (without conversion to a .trt file), there seems to be almost no objects detected (in the test video I am using). If I first convert the model to a .trt file, then there are detections but significantly less when compared to the same test video samples inferred with the model using TLT itself.

This is my pgie config setup:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
labelfile-path=…/…/…/…/samples/models/Primary_Detector/labels.txt
model-engine-file=…/…/…/…/samples/models/Primary_Detector/model_fp16.trt
force-implicit-batch-dim=1
batch-size=1
process-mode=1
model-color-format=0
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.2
group-threshold=2

(when using the .etlt file I just changed model-engine-file=…/…/…/…/samples/models/Primary_Detector/model.etlt)

I run inference in deepstream via a custom python script that I based off the “deepstream_python_apps/apps/deepstream_test1/” python example code.

Morganh · April 21, 2021, 1:54pm

Several comments:

You mentioned you are using TLT 1.0.1 at deepstream 5.0.1. Strongly suggest you to use TLT 3.0_dp instead.

This is not correct. Please see DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation . If you deploy etlt file in the config file, please set

tlt-encoded-model=<Path to DetectNet_v2 TLT model>
tlt-model-key=<Key to decrypt the model>

hj_es · April 22, 2021, 7:27am

Hi @Morganh thanks for those comments. I have followed the documentation for using the .etlt file, currently using the following config:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
labelfile-path=…/…/…/…/samples/models/Primary_Detector/labels.txt

tlt-encoded-model=…/…/…/…/samples/models/Primary_Detector/detectnet_v5.etlt
tlt-model-key=API_KEY
infer-dims=3;368;640
uff-input-blob-name=input_1
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd

force-implicit-batch-dim=1
batch-size=1
process-mode=1
model-color-format=0
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1

[class-attrs-all]
#pre-cluster-threshold=0.2
threshold=0.2
eps=0.2
group-threshold=1

I can now get detections from using the .etlt model file. The detection accuracy is the same as the .trt file though (which is worse than the inferences I get from using TLT). I also noticed that changing the “threshold” configuration above seems to have no affect. Is there another threshold configuration option I’m missing?

I am currently in the process of retraining the model using TLT 3.0.

Morganh · April 26, 2021, 9:52am

Sorry for late reply. For reference, I suggest you running a detetect_v2 sample which provides at /opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models

Refer to README file. Run with $ deepstream-app -c <deepstream_app_config> .
For example, to run peoplenet(it is based on detectnet_v2 network)
$ deepstream-app -c deepstream_app_source1_peoplenet.txt
Inside deepstream_app_source1_peoplenet.txt, there is config_infer_primary_peoplenet.txt.

BTW, you can set pre-cluster-threshold to a very low value to check if there are more bboxes.

hj_es · April 28, 2021, 2:38am

After training the same dataset (with the same detectnet/resnet18 network and training configuration) on TLT3.0, I have found that training accuracy (AP) plateaus at 33%. This is much less than what I was getting when I used TLT1.0 (~90% AP), however when I deployed this new model on the Jetson Nano and tested with the same test video I’ve been using previously, the detection performance looks unchanged.

Does this mean the reported AP value from TLT1.0 was not correct?

Are there different methods used to calculate AP between the two versions that could result in this difference in reported AP?

Accuracy on TLT3.0 training is reaching a plateau after only 10-20 epochs. I have a dataset that contains at least 50,000 examples of the one object class I’m training for. Do you have any suggestions for things I could try, in order to improve the training accuracy?

Thanks for your help!

hj_es · April 28, 2021, 6:47am

For your reference, I have used the following config for training a detectnet model with a dataset of 50,000+ examples of the object class:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/detectnet_v2/tfrecords/trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "head"
    value: "head"
  }
  
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 640
    output_image_height: 368
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "head"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
  
}
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "head"
    value: 0.5
  }
  
  evaluation_box_config {
    key: "head"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "head"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 120
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "head"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  
  deadzone_radius: 0.400000154972
}

Which of the above settings could I look at changing/tweaking to try and improve the accuracy?

Morganh · April 28, 2021, 7:21am

Above description does not make sense. Did you have any log for both training?

More, could you run tlt-infer(in TLT1.0.1) and tlt detectnet_v2 inference(TLT 3.0) to run inference against some test images?

imbatraman · April 29, 2021, 8:41am

Hi @Morganh I work with @hj_es - we’ll run the test as you suggested, and get back to you with the results.

However, our key question is, why is the AP so low (33%) after we trained a custom object detection model using TLT3.0 with nearly 50,000+ samples?

Do we need to change anything in the training config file above? Or is it something else that we are overlooking?

Morganh · April 29, 2021, 9:21am

Firstly, please check if your labels are correct.
Second, please resize your images/labels to 640x368 offline. See https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/open_model_architectures.html#detectnet-v2

The train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

Then, please check average width/height of the class “head”. Is it too small? If yes, please see Frequently Asked Questions — Transfer Learning Toolkit 3.0 documentation

In DetectNet_V2, are there any parameters that can help improve AP (average precision) on training small objects?

Following parameters can help you improve AP on smaller objects:

Increase num_layers of resnet

class_weight for small objects

Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class

Decrease minimum_detection_ground_truth_overlap

Lower minimum_height to cover more small objects for evaluation.

Topic		Replies	Views
TLT2.0 When using DetectNet/PeopleNet, do you need at least 2 classes..? TAO Toolkit	9	805	April 11, 2022
Setting a Confidence Algorithm DeepStream SDK	9	1551	October 12, 2021
Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)? TAO Toolkit	2	732	October 12, 2021
Using detectnet_v2 pretrained models in TLT v3.0 TAO Toolkit	11	920	October 27, 2021
Little to no detection on Deepstream-App compared to TLT's infer using the same model TAO Toolkit	6	629	October 12, 2021
No detections after training PeopleNet using custom labeled data TAO Toolkit	7	871	October 12, 2021
Cannot Create Mobilenet SSD TRT Engine on Jetson Nano \| [ERROR] UffParser: Unsupported number of graph 0 TAO Toolkit tensorrt	9	620	October 12, 2021
TLT3.0 retrain TrafficCamNet "car" class - weak inference results TAO Toolkit	14	877	October 12, 2021
"tlt-train detectnet_v2" lead core dump TAO Toolkit	7	967	October 12, 2021
Label files generated by tlt-infer TAO Toolkit	10	831	October 12, 2021

TLT trained model accuracy worse after deployment

Related topics