Different results between engine generated by Python and C

• Hardware Platform: GPU (GTX 1050 Laptop)
• DeepStream Version: 5.1
• TensorRT Version: Tested with 7.2.2 and 7.2.3
• CUDA Version: 11.1.1
• NVIDIA GPU Driver Version: Tested with 455.32 and 460.80
• Issue Type: Bugs

When engine is generated by deepstream-app, it has more bbox detections than engine generated by python custom app when using pre-cluster-threshold=0.001. I’m trying to calculate mAP from my custom python code, but I get more mAP from engine generated by deepstream-app than python using same nvdsinfer_custom_impl_Yolo lib and same PGIE config.

Batch-size: 1
Network-mode: FP32

mAP deepstream-app:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.499
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.739
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.551
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.346
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.613
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.357
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.587
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.631
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.472
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.686
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.756

mAP python:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.422
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.624
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.467
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.280
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.463
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.549
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.355
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.487
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.491
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.341
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.528
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.637

Hey, how do you calculate the mAP.
BTW, have you changed the nvinfer property in your custom python code?

I created the code for calculate COCO mAP by inferering the val2017 images and converting the results to json.

I used files from my repo: GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 5.1 configuration for YOLO models (native/nvdsinfer_custom_impl_Yolo)

My infer is:

pgie = Gst.ElementFactory.make("nvinfer", "primary-nvinference-engine")
pgie.set_property('config-file-path', "pgie.txt")

The results are different with higher threshold too

pre-cluster-threshold=0.25

deepstream-app generated engine:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.455
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.654
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.511
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.289
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.520
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.578
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.503
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.513
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.324
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.580
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.655

Python generated engine:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.396
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.564
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.446
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.246
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.441
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.524
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.332
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.438
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.440
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.273
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.486
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.585

Is there an update on this?

Waiting for update

Sry for the late, I think you can dump the image before push it to TensorRT engine, refer DeepStream SDK FAQ - #9 by mchi

I think the problem is in engine generation process in Python. If I use the engine generated by deepstream-app in my python code, the results are the same of deepstream-app.

It’s weird, the engine should be same if your configs are identical

But in my tests, it isn’t the same

You can run trtexec --loadEngine=<file> --verbose to check if there are any difference between the 2 engines

I got this same output from deepstream_app engine and python engine.

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=python_yolov4-tiny.engine --verbose
[06/16/2021-09:51:03] [I] === Model Options ===
[06/16/2021-09:51:03] [I] Format: *
[06/16/2021-09:51:03] [I] Model: 
[06/16/2021-09:51:03] [I] Output:
[06/16/2021-09:51:03] [I] === Build Options ===
[06/16/2021-09:51:03] [I] Max batch: 1
[06/16/2021-09:51:03] [I] Workspace: 16 MiB
[06/16/2021-09:51:03] [I] minTiming: 1
[06/16/2021-09:51:03] [I] avgTiming: 8
[06/16/2021-09:51:03] [I] Precision: FP32
[06/16/2021-09:51:03] [I] Calibration: 
[06/16/2021-09:51:03] [I] Refit: Disabled
[06/16/2021-09:51:03] [I] Safe mode: Disabled
[06/16/2021-09:51:03] [I] Save engine: 
[06/16/2021-09:51:03] [I] Load engine: model_b1_gpu0_fp32.engine
[06/16/2021-09:51:03] [I] Builder Cache: Enabled
[06/16/2021-09:51:03] [I] NVTX verbosity: 0
[06/16/2021-09:51:03] [I] Tactic sources: Using default tactic sources
[06/16/2021-09:51:03] [I] Input(s)s format: fp32:CHW
[06/16/2021-09:51:03] [I] Output(s)s format: fp32:CHW
[06/16/2021-09:51:03] [I] Input build shapes: model
[06/16/2021-09:51:03] [I] Input calibration shapes: model
[06/16/2021-09:51:03] [I] === System Options ===
[06/16/2021-09:51:03] [I] Device: 0
[06/16/2021-09:51:03] [I] DLACore: 
[06/16/2021-09:51:03] [I] Plugins:
[06/16/2021-09:51:03] [I] === Inference Options ===
[06/16/2021-09:51:03] [I] Batch: 1
[06/16/2021-09:51:03] [I] Input inference shapes: model
[06/16/2021-09:51:03] [I] Iterations: 10
[06/16/2021-09:51:03] [I] Duration: 3s (+ 200ms warm up)
[06/16/2021-09:51:03] [I] Sleep time: 0ms
[06/16/2021-09:51:03] [I] Streams: 1
[06/16/2021-09:51:03] [I] ExposeDMA: Disabled
[06/16/2021-09:51:03] [I] Data transfers: Enabled
[06/16/2021-09:51:03] [I] Spin-wait: Disabled
[06/16/2021-09:51:03] [I] Multithreading: Disabled
[06/16/2021-09:51:03] [I] CUDA Graph: Disabled
[06/16/2021-09:51:03] [I] Separate profiling: Disabled
[06/16/2021-09:51:03] [I] Skip inference: Disabled
[06/16/2021-09:51:03] [I] Inputs:
[06/16/2021-09:51:03] [I] === Reporting Options ===
[06/16/2021-09:51:03] [I] Verbose: Enabled
[06/16/2021-09:51:03] [I] Averages: 10 inferences
[06/16/2021-09:51:03] [I] Percentile: 99
[06/16/2021-09:51:03] [I] Dump refittable layers:Disabled
[06/16/2021-09:51:03] [I] Dump output: Disabled
[06/16/2021-09:51:03] [I] Profile: Disabled
[06/16/2021-09:51:03] [I] Export timing to JSON file: 
[06/16/2021-09:51:03] [I] Export output to JSON file: 
[06/16/2021-09:51:03] [I] Export profile to JSON file: 
[06/16/2021-09:51:03] [I] 
[06/16/2021-09:51:03] [I] === Device Information ===
[06/16/2021-09:51:03] [I] Selected Device: GeForce GTX 1050
[06/16/2021-09:51:03] [I] Compute Capability: 6.1
[06/16/2021-09:51:03] [I] SMs: 5
[06/16/2021-09:51:03] [I] Compute Clock Rate: 1.493 GHz
[06/16/2021-09:51:03] [I] Device Global Memory: 4040 MiB
[06/16/2021-09:51:03] [I] Shared Memory per SM: 96 KiB
[06/16/2021-09:51:03] [I] Memory Bus Width: 128 bits (ECC disabled)
[06/16/2021-09:51:03] [I] Memory Clock Rate: 3.504 GHz
[06/16/2021-09:51:03] [I] 
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::Proposal version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::Split version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[06/16/2021-09:51:03] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[06/16/2021-09:51:03] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin YoloLayer_TRT version 1
[06/16/2021-09:51:03] [E] [TRT] safeDeserializationUtils.cpp (322) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[06/16/2021-09:51:03] [E] [TRT] INVALID_STATE: std::exception
[06/16/2021-09:51:03] [E] [TRT] INVALID_CONFIG: Deserialize the cuda engine failed.
[06/16/2021-09:51:03] [E] Engine creation failed
[06/16/2021-09:51:03] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine --verbose

I tested and with YOLOv3 and YOLOv3-Tiny this difference doesn’t occurs. I’m using my files (github.com/marcoslucianops/Deepstream-Yolo). I created the funcitions to optimize the official YOLO, support the new models like YOLOv4, and others.

The YOLOv4 engine files from deepstream-app and python code are available in:
https://drive.google.com/drive/folders/16-VVPkw_aKRsB3FBSkDJU7THhwRlIl6g?usp=sharing

deepstream-app
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.499
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.739
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.551
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.346
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.613
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.357
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.587
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.631
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.472
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.686
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.756

python
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.422
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.624
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.467
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.280
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.463
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.549
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.355
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.487
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.491
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.341
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.528
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.637