What is detector using in Primary Detector Nano?

I run sample code deepstream-app on TX2, using Primary_detector_nano, and it pretty fast, >30 fps for 12 streams
I retrained a model detectnetv2_resnet10, and run with deepstream-app, performance is only <10 fps for 9 streams
I think my problem is about detector, because both model using the same backbone resnet 10.
So which detector are you using in sample?

Hi,
it’s samples/models/Primary_Detector/resnet10.caffemodel.
and it’s pruned model.
You could use /usr/src/tensorrt/bin/trtexec to measure infer performance.

1 Like

Hi,
Thanks for soon reply
But i still have no idea what type of detector in this. Resnet10 is just a backbone, i trained a model using detectnetv2 for detector but its not fast as sample model

I trained SSD mobilenetv2, but i got loss=Nan and crash from epoch 1
I use this config for train

random_seed: 42
ssd_config {
aspect_ratios_global: “[1.0, 2.0, 0.5, 3.0, 1.0/3.0]”
scales: “[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]”
two_boxes_for_ar1: true
clip_boxes: false
loss_loc_weight: 0.8
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: “[0.1, 0.1, 0.2, 0.2]”
arch: “mobilenet_v2”
freeze_bn: false
}
training_config {
batch_size_per_gpu: 1
num_epochs: 10
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-5
max_learning_rate: 2e-2
soft_start: 0.15
annealing: 0.5
}
}
regularizer {
type: L1
weight: 3e-06
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
batch_size: 8
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
clustering_iou_threshold: 0.6
top_k: 200
}
augmentation_config {
preprocessing {
output_image_width: 480
output_image_height: 256
output_image_channel: 3
crop_right: 480
crop_bottom: 256
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 0.7
zoom_max: 1.8
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
dataset_config {
data_sources: {
tfrecords_path: “/workspace/src/NVIDIA-train/transfer_learning_face_plate/tlt-experiments/tfrecords/kitti_trainval/kitti_trainval*”
image_directory_path: “/workspace/src/NVIDIA-train/transfer_learning_face_plate/data_train/training”
}
image_extension: “jpg”
target_class_mapping {
key: “face”
value: “face”
}
target_class_mapping {
key: “plate”
value: “plate”
}
validation_fold: 0
}

Any help, pls?

@subzeromot

There are many reasons that may cause your network to run slowly.
Is this network pruned or sparsed?
Is this network int8 mode, fp16 mode or fp32 mode?
Hardware conditions would also affect inference speed.
Is your hardware boosted mode or not boosted mode?
Are there any other processes using GPU resources?

For your question on behalf of SSD training, there would be also many reasons that cause your training to fail.
Network attribute configurations like learning rate, optimization method (SGD? ADAM?), batch size will affect training result.
Maybe there are flaws in your dataset configurations such as how objects are labeled is wrong.
Preprocessing of training inputs will also contribute to your training results.