Yokov7 in Deepstream give ouputs less than when running by using Tensorrt python API?

I follow tutorial from Nvidia yolo_deepstream/deepstream_yolo at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub for yolov7. I generated fp32 engine model. I run inference engine model for 1 image and save output to kitti format, number of detected objeted only 16. But when I run inference that image by using TensorRT python API, I got more than 30 detected objected.

I also checked for other images. It is also happened. It may be the reason the mAP from Deepstream is much less than from Tensorrt python api.
Conditions betwwen 2 cases are sames.
@mchi Could you please check again? Thanks.

We have some investigations on the DeepStream pipeline mAP measurement. The DeepStream pipeline is quite different to TensorRT pipeline. Especially the image processing before TensorRT inferencing. The pre-processing with TensorRT pipeline is “decoding → scaling->format conversion->normalization” while the processing in DeepStream pipeline is much more complicated. If the configuration of nvstreammux and nvvideoconvert are not proper, they can do extra scaling and format conversion before the “scaling->format conversion->normalization” inside gst-nvinfer. There are also some algorithm differences with the hardware scaling and the opencv scaling for some special resolution( E.G. some of the COCO dataset images have odd number width or height, there is some hardware limitation with such images.). We will fix the difference in the future.

To improve the DeepStream pipeline mAP, the suggestion is:

  1. Try to reduce the extra scaling or format conversion before gst-nvinfer. The suggested pipeline for JPEG image is:
filesrc->nvjpegdec->new nvstreammux->gst-nvinfer->nvmultistreamtiler->nvosd->sink
  1. Try to set the gst-nvinfer preprocessing parameters according to the TensorRT opencv processing parameters. E.G. the scaling algorithm is bilinear.
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
onnx-file=yolov7.onnx
labelfile-path=labels.txt
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
## Bilinear Interpolation
scaling-filter=1
#parse-bbox-func-name=NvDsInferParseCustomYoloV7
parse-bbox-func-name=NvDsInferParseCustomYoloV7_cuda
#disable-output-host-copy=0
disable-output-host-copy=1
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
#scaling-compute-hw=0
## start from DS6.2
crop-objects-to-roi-boundary=1

[class-attrs-all]
#nms-iou-threshold=0.3
#threshold=0.7
nms-iou-threshold=0.65
pre-cluster-threshold=0.25
topk=300
1 Like

Thanks for quick response.
I used the second config but it didn’t help. But I tried with YOLOv4 by using repo Update for DeepStream 6.2 · marcoslucianops/DeepStream-Yolo@ab6de54 · GitHub, preprocessing is the same and mAP of engine modek is good when running in Deepstream, a little bit less than mAP of the original model. With Yolov7 there is a big gap between original model and engine model from Deepstream.

Did you compare with FP32 engine?

1 Like

Yes, I did.
I tried repo GitHub - marcoslucianops/DeepStream-Yolo at ab6de54c4398c1daeb6fc31b06af29f97663f211 fp32 and int8 of yolov7 is low, mAP of fp32 model is less than original model’s mAP (about 4℅). The author of this repo also report mAP of fp16 model = 0.476 (original model’s mAP = 0.514). As usual, mAP of fp16 is the same mAP of fp32 model

I tried Nvidia repo yolo_deepstream/deepstream_yolo at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub run inference on coco val2017, save prediction to kitti format, mAP of fp32 model is less than mAP of original model about 5%. But I used this fp32 yolov7 engine model and evaluate with tensort python api based source code here GitHub - Linaom1214/TensorRT-For-YOLO-Series: tensorrt for yolo series (YOLOv8, YOLOv7, YOLOv6....), nms plugin support mAP of fp32 yolov7 model is almost same as original model.

Are you comparing yolo_deepstream/tensorrt_yolov7 at main · NVIDIA-AI-IOT/yolo_deepstream (github.com) and Linaom1214/TensorRT-For-YOLO-Series: tensorrt for yolo series (YOLOv8, YOLOv7, YOLOv6…), nms plugin support (github.com) wit the same yolov7 FP32 engine?

1 Like

For the DeepStream pipeline accuracy, I have give the improvement suggestion.

For the FP16 model mAP measurement, please refer to yolo_deepstream/yolov7_qat at main · NVIDIA-AI-IOT/yolo_deepstream (github.com)

Same yolov7 fp32 engine.
Actually I generated Yolov7 fp32 engine model by using Nvidia repo yolo_deepstream/tensorrt_yolov7 at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub (run Deepatream app). After that I did 2 evaluatation ways:

  1. Run inference the above generated fp32 engine model by using Nvidia repo (deepstream app) on coco val2017, save prediction to txt =>json and evaluate, mAP is not good
  2. Run infferen the above generated fp32 engine model by using GitHub - Linaom1214/TensorRT-For-YOLO-Series: tensorrt for yolo series (YOLOv8, YOLOv7, YOLOv6....), nms plugin support (need to modify) mAP is good

Performance benchmark here is also done by using Tensorrt python API. It seems that there is a problem with Deepstream when inference model.

What kind of problem? Can you give us the details?

I am not sure.
Here is the mAP of fp32 yolov7 engine (get prediction kitti format => json and evaluate). I used config as you suggested with modification score_thresh=0.001

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.449
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.625
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.485
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.278
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.491
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.606
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.567
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.612
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.438
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.666
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.761

mAP of original is 0.512.

Have you modified the pipeline too?

And I have mentioned the DeepStream still has some problem with scaling algorithm in the Yokov7 in Deepstream give ouputs less than when running by using Tensorrt python API? - #3 by Fiona.Chen post, have you read it?

I have read it.
I use your config

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
onnx-file=yolov7.onnx
labelfile-path=labels.txt
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
## Bilinear Interpolation
scaling-filter=1
#parse-bbox-func-name=NvDsInferParseCustomYoloV7
parse-bbox-func-name=NvDsInferParseCustomYoloV7_cuda
#disable-output-host-copy=0
disable-output-host-copy=1
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
#scaling-compute-hw=0
## start from DS6.2
crop-objects-to-roi-boundary=1

[class-attrs-all]
#nms-iou-threshold=0.3
#threshold=0.7
nms-iou-threshold=0.65
pre-cluster-threshold=0.25
topk=300

to run in Deepstream-app to get prediction in Kitti format .txt. I only change score_thre=0.001 (not used pre-cluster-threshold=0.25), because when setting score_thre high, mAP is worse. I checked it.

Blockquote And I have mentioned the DeepStream still has some problem with scaling algorithm in the Yokov7 in Deepstream give ouputs less than when running by using Tensorrt python API? - #3 by Fiona.Chen post, have you read it?

Yeah, please update as soon as possible. I waiting for it.

The configuration file does not resolve the problems I mentioned. I have explained why the DeepStream can not reach the same mAP as TensorRT sample code. It is not just a configuration issue.

Ok thanks a lot for supporting.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.