Engine failed to match config params, trying rebuild

tunahan.apaydin · July 5, 2024, 12:00pm

Please provide complete information as applicable to your setup.

**• Hardware Platform (Jetson / GPU) - Jetson Xavier NX
**• DeepStream Version - 6.3
**• JetPack Version (valid for Jetson only) - 5.1.2
**• TensorRT Version - 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only)
**• Issue Type( questions, new requirements, bugs) - Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
**• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) - deepstream-test5-app

Hi,

1- I followed this repo: GitHub - WongKinYiu/yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors and trained custom yolov7-tiny model.

Train command: python train.py --workers 8 --device 0 --batch-size 16 --data data/custom.yaml --img 640 480 --cfg cfg/training/yolov7-tiny.yaml --weights ‘yolov7-tiny.pt’ --name yolov7-tiny-custom --hyp data/hyp.scratch.tiny.yaml

2- I reparameterized the custom model as recommended by the repository.

3- I followed this repo: yolo_deepstream/yolov7_qat at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub

I converted the qat model to int8 model with the following commands:

sudo /usr/src/tensorrt/bin/trtexec --onnx=qat.onnx --int8 --saveEngine=yolov7_tiny_qat_3.engine --workspace=1024000
sudo /usr/src/tensorrt/bin/trtexec --onnx=qat.onnx --int8 --fp16 --saveEngine=yolov7_tiny_qat_2.engine --workspace=1024000 --minShapes=images:4x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:4x3x640x640
sudo /usr/src/tensorrt/bin/trtexec --onnx=qat.onnx --int8 --saveEngine=yolov7_tiny_qat.engine --workspace=1024000 --minShapes=images:4x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:4x3x640x640

The engine file was created successfully every time.
When i check the model with this: sudo /usr/src/tensorrt/bin/trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7/yolov7_tiny_qat.engine --plugins=./libmyplugins.so i got this:

[07/05/2024-14:58:30] [I] === Performance summary ===
[07/05/2024-14:58:30] [I] Throughput: 34.7856 qps
[07/05/2024-14:58:30] [I] Latency: min = 24.8691 ms, max = 42.3534 ms, mean = 28.7271 ms, median = 25.0518 ms, percentile(90%) = 39.1527 ms, percentile(95%) = 39.1918 ms, percentile(99%) = 39.2491 ms
[07/05/2024-14:58:30] [I] Enqueue Time: min = 3.00171 ms, max = 5.73328 ms, mean = 3.9051 ms, median = 3.82019 ms, percentile(90%) = 4.84863 ms, percentile(95%) = 5.13629 ms, percentile(99%) = 5.36453 ms
[07/05/2024-14:58:30] [I] H2D Latency: min = 0.703857 ms, max = 1.16614 ms, mean = 0.820026 ms, median = 0.720337 ms, percentile(90%) = 1.16443 ms, percentile(95%) = 1.16507 ms, percentile(99%) = 1.16565 ms
[07/05/2024-14:58:30] [I] GPU Compute Time: min = 24.0393 ms, max = 40.9948 ms, mean = 27.7691 ms, median = 24.2098 ms, percentile(90%) = 37.7932 ms, percentile(95%) = 37.8299 ms, percentile(99%) = 37.8911 ms
[07/05/2024-14:58:30] [I] D2H Latency: min = 0.11731 ms, max = 0.19989 ms, mean = 0.137928 ms, median = 0.123291 ms, percentile(90%) = 0.195007 ms, percentile(95%) = 0.196533 ms, percentile(99%) = 0.198792 ms
[07/05/2024-14:58:30] [I] Total Host Walltime: 3.07598 s
[07/05/2024-14:58:30] [I] Total GPU Compute Time: 2.97129 s
[07/05/2024-14:58:30] [W] * GPU compute time is unstable, with coefficient of variance = 19.9549%.
[07/05/2024-14:58:30] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[07/05/2024-14:58:30] [I] Explanations of the performance metrics are printed in the verbose logs.
[07/05/2024-14:58:30] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7/yolov7_tiny_qat.engine --plugins=./libmyplugins.so

4- When I want to test the created engine file with the deepstream-test5-app application, I get the following error:

Unknown or legacy key specified ‘is-classifier’ for group [property]
Unknown or legacy key specified ‘disable-output-host-copy’ for group [property]
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
[NvMultiObjectTracker] Initialized
0:00:08.013987301 8575 0xaaaafff84580 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1988> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7/yolov7_tiny_qat_3.engine
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT images 3x640x640
1 OUTPUT kFLOAT outputs 25200x7

0:00:08.093343861 8575 0xaaaafff84580 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1920> [UID = 1]: Backend has maxBatchSize 1 whereas 16 has been requested
0:00:08.093459190 8575 0xaaaafff84580 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2097> [UID = 1]: deserialized backend context :/opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7/yolov7_tiny_qat_3.engine failed to match config params, trying rebuild
0:00:08.122447330 8575 0xaaaafff84580 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: Trying to create engine from model files
ERROR: failed to build network since there is no model file matched.
ERROR: failed to build network.
0:00:09.446987646 8575 0xaaaafff84580 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2022> [UID = 1]: build engine file failed
0:00:09.520248994 8575 0xaaaafff84580 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2108> [UID = 1]: build backend context failed
0:00:09.520404612 8575 0xaaaafff84580 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1282> [UID = 1]: generate backend failed, check config file settings
0:00:09.520527044 8575 0xaaaafff84580 WARN nvinfer gstnvinfer.cpp:898:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferContext instance
0:00:09.520577637 8575 0xaaaafff84580 WARN nvinfer gstnvinfer.cpp:898:gst_nvinfer_start:<primary_gie> error: Config file path: /opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7/config_infer_primary_yoloV7_tiny.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
[NvMultiObjectTracker] De-initialized
** ERROR: main:1534: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to create NvDsInferContext instance
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(898): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie:
Config file path: /opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7/config_infer_primary_yoloV7_tiny.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed

(.pt → .onnx → fp16(using deepstream) works without any problems.)

config_infer_primary_yoloV7_tiny.txt (4.1 KB)

I can’t understand why the int8 model doesn’t work.

Thanks for help.

yuweiw · July 8, 2024, 1:54am

Could you set the batch-size=16 in your config file?

tunahan.apaydin · July 8, 2024, 8:31am

Thanks for help.

Setting the batch-size parameter to 16 in the config_infer_primary_yoloV7_tiny.txt file did not work. However, this reminded me that the primary-gie/batch-size parameter in the deepstream_app_config_yoloV7_tiny.txt file is set to 16. When I removed this parameter, it turned out that this was the parameter that did not match the model. The int8 model is currently working.

At this point, my other question is:
.pt → .onnx → deepstream → fp16 = 6.48 FPS with 25 video sources
.pt → qat.onnx → trtexec → int8 = 3.94 FPS with 25 video sources

While other parameters are exactly the same, is the FPS difference due to the conversion method? Is it normal that there is a difference in FPS between converting using trtexec and deepstream?

yuweiw · July 8, 2024, 8:40am

Can you describe in detail how you got these 2 values of FPS?

tunahan.apaydin · July 8, 2024, 8:51am

I measured using the perf-measurement-interval-sec parameter.

deepstream_app_config_yoloV7_tiny.txt (7.4 KB)

fp16 →
config_infer_primary_yoloV7_tiny.txt (4.2 KB)

int8 →
config_infer_primary_yoloV7_tiny.txt (4.1 KB)

I tested each model with deepstream-test5-app.

yuweiw · July 8, 2024, 9:03am

What’s your whole command to test the trtexec perf?

tunahan.apaydin · July 8, 2024, 10:24am

I ran the following command: sudo /usr/src/tensorrt/bin/trtexec --onnx=qat.onnx --int8 --saveEngine=yolov7_tiny_qat_3.engine --workspace=1024000 and created the int8 engine file. I then tested the engine by running the following command: sudo /opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/deepstream-test5/deepstream-test5-app -c /opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7-tiny/deepstream_app_config_yoloV7_tiny.txt.

I followed the following outputs from the terminal(example):
**PERF: 19.45 (19.00) 19.45 (19.04) 19.45 (19.08) 19.40 (19.04) 19.47 (19.10) 19.47 (18.98) 19.47 (19.06) 19.47 (19.04) 19.47 (18.99) 19.41 (19.10) 19.41 (19.06) 19.43 (19.07) 19.43 (18.97) 19.43 (19.08) 19.43 (19.07) 19.40 (19.08) 19.38 (19.10) 19.44 (19.06) 19.44 (18.99) 19.44 (19.01) 19.44 (19.05) 19.44 (19.04) 19.39 (19.10) 19.45 (19.07) 19.45 (19.10) and got the results I mentioned above.

yuweiw · July 8, 2024, 10:32am

I know how DeepStream got the fps. I mean what is your whole command to test the trtexec command perf(the 2nd pipeline you attached)?

tunahan.apaydin · July 8, 2024, 10:41am

I guess that is what you mean.

sudo /usr/src/tensorrt/bin/trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7-tiny/yolov7_tiny_qat_3.engine

Output:

&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7-tiny/yolov7_tiny_qat_3.engine
[07/08/2024-13:41:01] [I] === Model Options ===
[07/08/2024-13:41:01] [I] Format: *
[07/08/2024-13:41:01] [I] Model:
[07/08/2024-13:41:01] [I] Output:
[07/08/2024-13:41:01] [I] === Build Options ===
[07/08/2024-13:41:01] [I] Max batch: 1
[07/08/2024-13:41:01] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[07/08/2024-13:41:01] [I] minTiming: 1
[07/08/2024-13:41:01] [I] avgTiming: 8
[07/08/2024-13:41:01] [I] Precision: FP32
[07/08/2024-13:41:01] [I] LayerPrecisions:
[07/08/2024-13:41:01] [I] Calibration:
[07/08/2024-13:41:01] [I] Refit: Disabled
[07/08/2024-13:41:01] [I] Sparsity: Disabled
[07/08/2024-13:41:01] [I] Safe mode: Disabled
[07/08/2024-13:41:01] [I] DirectIO mode: Disabled
[07/08/2024-13:41:01] [I] Restricted mode: Disabled
[07/08/2024-13:41:01] [I] Build only: Disabled
[07/08/2024-13:41:01] [I] Save engine:
[07/08/2024-13:41:01] [I] Load engine: /opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7-tiny/yolov7_tiny_qat_3.engine
[07/08/2024-13:41:01] [I] Profiling verbosity: 0
[07/08/2024-13:41:01] [I] Tactic sources: Using default tactic sources
[07/08/2024-13:41:01] [I] timingCacheMode: local
[07/08/2024-13:41:01] [I] timingCacheFile:
[07/08/2024-13:41:01] [I] Heuristic: Disabled
[07/08/2024-13:41:01] [I] Preview Features: Use default preview flags.
[07/08/2024-13:41:01] [I] Input(s)s format: fp32:CHW
[07/08/2024-13:41:01] [I] Output(s)s format: fp32:CHW
[07/08/2024-13:41:01] [I] Input build shapes: model
[07/08/2024-13:41:01] [I] Input calibration shapes: model
[07/08/2024-13:41:01] [I] === System Options ===
[07/08/2024-13:41:01] [I] Device: 0
[07/08/2024-13:41:01] [I] DLACore:
[07/08/2024-13:41:01] [I] Plugins:
[07/08/2024-13:41:01] [I] === Inference Options ===
[07/08/2024-13:41:01] [I] Batch: 1
[07/08/2024-13:41:01] [I] Input inference shapes: model
[07/08/2024-13:41:01] [I] Iterations: 10
[07/08/2024-13:41:01] [I] Duration: 3s (+ 200ms warm up)
[07/08/2024-13:41:01] [I] Sleep time: 0ms
[07/08/2024-13:41:01] [I] Idle time: 0ms
[07/08/2024-13:41:01] [I] Streams: 1
[07/08/2024-13:41:01] [I] ExposeDMA: Disabled
[07/08/2024-13:41:01] [I] Data transfers: Enabled
[07/08/2024-13:41:01] [I] Spin-wait: Disabled
[07/08/2024-13:41:01] [I] Multithreading: Disabled
[07/08/2024-13:41:01] [I] CUDA Graph: Disabled
[07/08/2024-13:41:01] [I] Separate profiling: Disabled
[07/08/2024-13:41:01] [I] Time Deserialize: Disabled
[07/08/2024-13:41:01] [I] Time Refit: Disabled
[07/08/2024-13:41:01] [I] NVTX verbosity: 0
[07/08/2024-13:41:01] [I] Persistent Cache Ratio: 0
[07/08/2024-13:41:01] [I] Inputs:
[07/08/2024-13:41:01] [I] === Reporting Options ===
[07/08/2024-13:41:01] [I] Verbose: Disabled
[07/08/2024-13:41:01] [I] Averages: 10 inferences
[07/08/2024-13:41:01] [I] Percentiles: 90,95,99
[07/08/2024-13:41:01] [I] Dump refittable layers:Disabled
[07/08/2024-13:41:01] [I] Dump output: Disabled
[07/08/2024-13:41:01] [I] Profile: Disabled
[07/08/2024-13:41:01] [I] Export timing to JSON file:
[07/08/2024-13:41:01] [I] Export output to JSON file:
[07/08/2024-13:41:01] [I] Export profile to JSON file:
[07/08/2024-13:41:01] [I]
[07/08/2024-13:41:01] [I] === Device Information ===
[07/08/2024-13:41:01] [I] Selected Device: Xavier
[07/08/2024-13:41:01] [I] Compute Capability: 7.2
[07/08/2024-13:41:01] [I] SMs: 6
[07/08/2024-13:41:01] [I] Compute Clock Rate: 1.109 GHz
[07/08/2024-13:41:01] [I] Device Global Memory: 6845 MiB
[07/08/2024-13:41:01] [I] Shared Memory per SM: 96 KiB
[07/08/2024-13:41:01] [I] Memory Bus Width: 256 bits (ECC disabled)
[07/08/2024-13:41:01] [I] Memory Clock Rate: 1.109 GHz
[07/08/2024-13:41:01] [I]
[07/08/2024-13:41:01] [I] TensorRT version: 8.5.2
[07/08/2024-13:41:01] [I] Engine loaded in 0.209995 sec.
[07/08/2024-13:41:03] [I] [TRT] Loaded engine size: 7 MiB
[07/08/2024-13:41:03] [W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[07/08/2024-13:41:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +343, GPU +269, now: CPU 595, GPU 3536 (MiB)
[07/08/2024-13:41:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +8, now: CPU 0, GPU 8 (MiB)
[07/08/2024-13:41:06] [I] Engine deserialized in 4.57182 sec.
[07/08/2024-13:41:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 595, GPU 3536 (MiB)
[07/08/2024-13:41:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +137, now: CPU 0, GPU 145 (MiB)
[07/08/2024-13:41:06] [I] Setting persistentCacheLimit to 0 bytes.
[07/08/2024-13:41:06] [I] Using random values for input images
[07/08/2024-13:41:06] [I] Created input binding for images with dimensions 1x3x640x640
[07/08/2024-13:41:06] [I] Using random values for output outputs
[07/08/2024-13:41:06] [I] Created output binding for outputs with dimensions 1x25200x7
[07/08/2024-13:41:06] [I] Starting inference
[07/08/2024-13:41:09] [I] Warmup completed 1 queries over 200 ms
[07/08/2024-13:41:09] [I] Timing trace has 332 queries over 2.88795 s
[07/08/2024-13:41:09] [I]
[07/08/2024-13:41:09] [I] === Trace details ===
[07/08/2024-13:41:09] [I] Trace averages of 10 runs:
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 18.9359 ms - Host latency: 19.4975 ms (enqueue 3.61808 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 11.1465 ms - Host latency: 11.4503 ms (enqueue 3.27811 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 10.708 ms - Host latency: 11.0055 ms (enqueue 3.2691 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82712 ms - Host latency: 8.04069 ms (enqueue 3.02569 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 8.08832 ms - Host latency: 8.30468 ms (enqueue 3.19858 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82518 ms - Host latency: 8.03963 ms (enqueue 2.97252 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82941 ms - Host latency: 8.04319 ms (enqueue 2.8594 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82692 ms - Host latency: 8.04103 ms (enqueue 3.00575 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.83351 ms - Host latency: 8.04733 ms (enqueue 3.2326 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82939 ms - Host latency: 8.04318 ms (enqueue 2.96793 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82729 ms - Host latency: 8.04154 ms (enqueue 3.36674 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.81897 ms - Host latency: 8.03446 ms (enqueue 3.22677 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 8.05098 ms - Host latency: 8.26477 ms (enqueue 3.12705 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.8183 ms - Host latency: 8.03151 ms (enqueue 3.09554 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.81986 ms - Host latency: 8.03369 ms (enqueue 3.08525 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82544 ms - Host latency: 8.03925 ms (enqueue 3.03726 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.81415 ms - Host latency: 8.02954 ms (enqueue 3.51119 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.83386 ms - Host latency: 8.0481 ms (enqueue 3.04595 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82877 ms - Host latency: 8.04205 ms (enqueue 3.0391 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 8.08694 ms - Host latency: 8.30098 ms (enqueue 3.11271 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.83005 ms - Host latency: 8.04502 ms (enqueue 2.90305 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82705 ms - Host latency: 8.0407 ms (enqueue 3.16528 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.84851 ms - Host latency: 8.06831 ms (enqueue 3.1854 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 8.36785 ms - Host latency: 8.59512 ms (enqueue 3.91667 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 9.11221 ms - Host latency: 9.34561 ms (enqueue 4.39216 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.9311 ms - Host latency: 8.14849 ms (enqueue 3.20911 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 8.04946 ms - Host latency: 8.27036 ms (enqueue 3.18606 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 8.04055 ms - Host latency: 8.26262 ms (enqueue 3.25076 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 8.04089 ms - Host latency: 8.26099 ms (enqueue 3.24097 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82808 ms - Host latency: 8.04282 ms (enqueue 3.14546 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82832 ms - Host latency: 8.04263 ms (enqueue 2.95879 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.82993 ms - Host latency: 8.04583 ms (enqueue 3.20991 ms)
[07/08/2024-13:41:09] [I] Average on 10 runs - GPU latency: 7.83213 ms - Host latency: 8.04702 ms (enqueue 2.998 ms)
[07/08/2024-13:41:09] [I]
[07/08/2024-13:41:09] [I] === Performance summary ===
[07/08/2024-13:41:09] [I] Throughput: 114.961 qps
[07/08/2024-13:41:09] [I] Latency: min = 7.96069 ms, max = 23.632 ms, mean = 8.68096 ms, median = 8.04974 ms, percentile(90%) = 9.97339 ms, percentile(95%) = 11.4469 ms, percentile(99%) = 23.5274 ms
[07/08/2024-13:41:09] [I] Enqueue Time: min = 2.78937 ms, max = 8.35522 ms, mean = 3.2051 ms, median = 3.0647 ms, percentile(90%) = 3.54047 ms, percentile(95%) = 3.7323 ms, percentile(99%) = 5.2605 ms
[07/08/2024-13:41:09] [I] H2D Latency: min = 0.177856 ms, max = 0.580231 ms, mean = 0.195686 ms, median = 0.180664 ms, percentile(90%) = 0.227844 ms, percentile(95%) = 0.256378 ms, percentile(99%) = 0.578369 ms
[07/08/2024-13:41:09] [I] GPU Compute Time: min = 7.74707 ms, max = 22.9504 ms, mean = 8.44914 ms, median = 7.83508 ms, percentile(90%) = 9.75623 ms, percentile(95%) = 11.1423 ms, percentile(99%) = 22.8456 ms
[07/08/2024-13:41:09] [I] D2H Latency: min = 0.0314941 ms, max = 0.103943 ms, mean = 0.0361373 ms, median = 0.0339355 ms, percentile(90%) = 0.0368652 ms, percentile(95%) = 0.0470581 ms, percentile(99%) = 0.100922 ms
[07/08/2024-13:41:09] [I] Total Host Walltime: 2.88795 s
[07/08/2024-13:41:09] [I] Total GPU Compute Time: 2.80511 s
[07/08/2024-13:41:09] [W] * GPU compute time is unstable, with coefficient of variance = 26.4652%.
[07/08/2024-13:41:09] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[07/08/2024-13:41:09] [I] Explanations of the performance metrics are printed in the verbose logs.
[07/08/2024-13:41:09] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-6.3/sources/objectDetector_Yolo/yolov7-tiny/yolov7_tiny_qat_3.engine

yuweiw · July 9, 2024, 3:19am

Just from the trtexec command, you did not configure batch-id and so on.

system · July 23, 2024, 3:20am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Anyway to boost yolo performance on Jetson Orin? Jetson Orin Nano yolo	17	584	December 31, 2024
When to use NvDsPreprocess with multiple ROIs? DeepStream SDK	11	889	October 3, 2023
Run BACK-TO-BACK-DETECTORS REFERENCE APP under DeepStream SDK 5.0 DeepStream SDK	16	998	October 12, 2021
Model ran much slower in deepstream pipeline DeepStream SDK	2	729	October 12, 2021
What kind of hardware rigs can support 100+ videos analytics using deepstream? DeepStream SDK hw	30	1808	October 12, 2021
Bus error while running deepstream refrerence app DeepStream SDK	13	1721	October 12, 2021
The throughput not increase when using dla on xavier DeepStream SDK deepstream	3	10	December 18, 2024
Failed to parse ONNX model from file DeepStream SDK jetson , deepstream	5	74	January 29, 2025
Issue with Deepstream Inference of custom 3D action recognition model DeepStream SDK	8	1008	May 18, 2022
Python deepstream-test2 substitue from file to H264 RTP source - Perofrmance low DeepStream SDK deepstream	18	43	March 11, 2025

Engine failed to match config params, trying rebuild

Related topics