Tlt3.0 train yolov4 of resnet10, "tlt yolo_v4 inference" could get right bboxes, but deepstream5.1 get wrong result

deepstream5.1
tlt3.0

train.txt
random_seed: 42
yolov4_config {
big_anchor_shape: “[(114.94, 60.67), (159.06, 114.59), (297.59, 176.38)]”
mid_anchor_shape: “[(42.99, 31.91), (79.57, 31.75), (56.80, 56.93)]”
small_anchor_shape: “[(15.60, 13.88), (30.25, 20.25), (20.67, 49.63)]”
box_matching_iou: 0.25
arch: “resnet”
nlayers: 10
arch_conv_blocks: 2
loss_loc_weight: 0.8
loss_neg_obj_weights: 100.0
loss_class_weights: 0.5
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 12
num_epochs: 80
enable_qat: false
checkpoint_interval: 1
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
#pretrain_model_path: “/workspace/tlt-experiments/yolo_v4/pretrained_resnet10/tlt_pretrained_object_detection_vresnet10/resnet_10.hdf5”
pretrain_model_path:“/workspace/tlt-experiments/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet10_epoch_014.tlt”
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 8
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 640
output_height: 384
randomize_input_shape_period: 0
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/training/label_2”
image_directory_path: “/workspace/tlt-experiments/data/training/image_2”
}
include_difficult_in_training: true
target_class_mapping {
key: “person”
value: “person”
}

validation_data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/val/label_2”
image_directory_path: “/workspace/tlt-experiments/data/val/image_2”
}
}

[nvinfer config file]
[property]
gpu-id=0
net-scale-factor=1.0
offsets=103.939;116.779;123.68
model-color-format=1
labelfile-path=/home/satchel/deepstream_models/person1/labels.txt
model-engine-file=/home/satchel/deepstream_models/person1/yolov4_resnet10_epoch_080.etlt_b1_gpu0_fp32.engine
#int8-calib-file=…/…/models/yolov4/cal.bin
tlt-encoded-model=/home/satchel/deepstream_models/person1/yolov4_resnet10_epoch_080.etlt
tlt-model-key=***
infer-dims=3;384;640
maintain-aspect-ratio=1
uff-input-order=0
uff-input-blob-name=Input
batch-size=1
network-mode=0
num-detected-classes=1
interval=0
gie-unique-id=1
is-classifier=0
#network-type=0
cluster-mode=3
output-blob-names=BatchedNMS
parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT
custom-lib-path=/home/satchel/deepstream_models/person1/libnvds_infercustomparser_tlt.so

[class-attrs-all]
pre-cluster-threshold=0.3
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

the reference deploy project
https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps

export ENABLE_DEBUG=1
log out:
label/conf/ x/y x/y – 0 6.14712e+32 0.00782681 0.00782668 0.00782092 0.0078125
label/conf/ x/y x/y – 0 5.79216e+32 0.0078125 0.0078125 0.00782681 0.00782681
label/conf/ x/y x/y – 0 4.93015e+32 0.00781612 0.0078125 0.0078125 0.00782681
label/conf/ x/y x/y – 0 4.5752e+32 0.00782681 0.0078125 0.00782681 0.00782681
label/conf/ x/y x/y – 0 3.99207e+32 0.0078125 0.0078125 0.00782037 0.00782681

could someone help me? Thanks very much!

Some questions.

  1. Where did you run inference? I think you run it in your host PC, right?
  2. How did you generate /home/satchel/deepstream_models/person1/yolov4_resnet10_epoch_080.etlt_b1_gpu0_fp32.engine ?
  3. Can you comment out below and retry?
    model-engine-file=/home/satchel/deepstream_models/person1/yolov4_resnet10_epoch_080.etlt_b1_gpu0_fp32.engine
  4. To narrow down, can you run the reference yolo_v4 model mentioned in GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream successfully?
  1. Right, runned on host pc ubuntu18.04 with deepstream5.1
  2. /home/satchel/deepstream_models/person1/yolov4_resnet10_epoch_080.etlt_b1_gpu0_fp32.engine was generated by running the gst pipeline with “#model-engine-file=/home/satchel/deepstream_models/person1/yolov4_resnet10_epoch_080.etlt_b1_gpu0_fp32”.engine noted off
  3. After step 2, I noted off # to run gst pipeline again.
  4. Not yet, I will try soon
    Thanks

I tried deepstream_tlt_apps, but it seemed there were new errors happend.

(base) satchel@satchel-ubuntu:~/deeplearning/Deploy/deepstream_tlt_apps/apps$ ./ds-tlt -c /home/satchel/deepstream_models/person/pgie_yolov4_tlt_config.txt -i /home/satchel/softwares/deepstream_sdk_v5.1.0_x86_64/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
Unknown or legacy key specified ‘is-classifier’ for group [property]
Now playing: /home/satchel/deepstream_models/person/pgie_yolov4_tlt_config.txt
WARNING: …/nvdsinfer/nvdsinfer_func_utils.cpp:36 [TRT]: TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
WARNING: …/nvdsinfer/nvdsinfer_func_utils.cpp:36 [TRT]: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
WARNING: …/nvdsinfer/nvdsinfer_func_utils.cpp:36 [TRT]: TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
WARNING: …/nvdsinfer/nvdsinfer_func_utils.cpp:36 [TRT]: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
0:00:00.990112100 13808 0x56227d5ee070 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/home/satchel/deepstream_models/person/yolov4_resnet10_epoch_080.etlt_b1_gpu0_fp32.engine
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:685 [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT Input 3x384x640
1 OUTPUT kINT32 BatchedNMS 0
2 OUTPUT kFLOAT BatchedNMS_1 200x4
3 OUTPUT kFLOAT BatchedNMS_2 200
4 OUTPUT kFLOAT BatchedNMS_3 200

0:00:00.990235266 13808 0x56227d5ee070 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /home/satchel/deepstream_models/person/yolov4_resnet10_epoch_080.etlt_b1_gpu0_fp32.engine
0:00:00.996335490 13808 0x56227d5ee070 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus: [UID 1]: Load new model:/home/satchel/deepstream_models/person/pgie_yolov4_tlt_config.txt sucessfully
Running…
ERROR: nvdsinfer_context_impl.cpp:1573 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:00:01.248884517 13808 0x56227d5e6f20 WARN nvinfer gstnvinfer.cpp:2021:gst_nvinfer_output_loop: error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
ERROR: …/nvdsinfer/nvdsinfer_func_utils.cpp:33 [TRT]: safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
0:00:01.248930787 13808 0x56227d5e6f20 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1599> [UID = 1]: Tried to release an unknown outputBatchID
ERROR from element primary-nvinference-engine: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Error details: gstnvinfer.cpp(2021): gst_nvinfer_output_loop (): /GstPipeline:ds-custom-pipeline/GstNvInfer:primary-nvinference-engine
Returned, stopping playback
ERROR: …/nvdsinfer/nvdsinfer_func_utils.cpp:33 [TRT]: FAILED_EXECUTION: std::exception
ERROR: nvdsinfer_backend.cpp:287 Failed to enqueue inference batch
Cuda failure: status=700 in CreateTextureObj at line 2902
ERROR: nvdsinfer_context_impl.cpp:1533 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
nvbufsurftransform.cpp:2703: => Transformation Failed -2

0:00:01.249032995 13808 0x56227d5e6d40 WARN nvinfer gstnvinfer.cpp:1225:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
段错误 (核心已转储)

tlt-model-key=Y29iMHNhOTkwcmo4c3ViNmNmcXZob3BxM2I6N2EzM2I0NjYtOTFlYy00NzUzLWE3NWYtZGViMjdhMDgxNGZi
yolov4_resnet10_epoch_080.etlt (11.1 MB)
pgie_yolov4_tlt_config.txt (2.3 KB)
The tlt-model-key and tlt model have been uploaded.
Please help. Thanks

Can you rebuild the engine with below modification in your config?

#model-engine-file=/home/satchel/deepstream_models/person/yolov4_resnet10_epoch_080.etlt_b1_gpu0_fp32.engine
int8-calib-file=…/…/models/yolov4/cal.bin
tlt-encoded-model=/home/satchel/deepstream_models/person/yolov4_resnet10_epoch_080.etlt
tlt-model-key=Y29iMHNhOTkwcmo4c3ViNmNmcXZob3BxM2I6N2EzM2I0NjYtOTFlYy00NzUzLWE3NWYtZGViMjdhMDgxNGZi

I have tried using nvinfer plugin to generate engine file, but it was still not worked.

May I know the CUDA/Cudnn/TensorRT version in you host PC?
According to above log, seems that the error is related to them.

PC is not around now. Tonight I will check out, Thanks

I recompile the TensorRT-OSS,and finally it worked. Thanks

1 Like