Yolov6 Slow inference speed on the Nvidia Jetson NX board

shahizat · August 9, 2022, 8:15am

Hi Jetson community,

I changed the yolov6 code to be able to use my intel realsense camera as input source like image and video, however inference speed is very slow. Approximately 1-2 fps. Yolov5 and Yolov7 gives an average 20 fps without TensorRT optimization. What can be a reason for that?

Thanks in advance for your reply.

Best regards,
Shakhizat

AastaLLL · August 10, 2022, 3:09am

Hi,

Which frameworks do you use?
In general, it’s recommended to deploy the DNN model with TensorRT for better performance.

Thanks.

shahizat · August 10, 2022, 10:13am

Hi AastaLL,

Thanks for your reply. I managed to fix. The issue was in the device parameter, which used only cpu to run inference.

So, I encountered with the another issue. I can not upgrade the TensorRT by using below command:

$ sudo echo "deb https://repo.download.nvidia.com/jetson/common r32.7 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
$ sudo echo "deb https://repo.download.nvidia.com/jetson/t194 r32.7 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
$ sudo apt update
$ sudo apt install nvidia-tensorrt

shahizat · August 11, 2022, 1:39pm

Hi AasaLLL,

Do you know why the below error message appears, when I want to convert my onnx model to tensorrt engine?

&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/yolov7/best-yolov7nano.onnx --saveEngine=/home/jetson/yolov7/engine_yolov7
[08/11/2022-18:37:30] [I] === Model Options ===
[08/11/2022-18:37:30] [I] Format: ONNX
[08/11/2022-18:37:30] [I] Model: /home/jetson/yolov7/best-yolov7nano.onnx
[08/11/2022-18:37:30] [I] Output:
[08/11/2022-18:37:30] [I] === Build Options ===
[08/11/2022-18:37:30] [I] Max batch: explicit
[08/11/2022-18:37:30] [I] Workspace: 16 MiB
[08/11/2022-18:37:30] [I] minTiming: 1
[08/11/2022-18:37:30] [I] avgTiming: 8
[08/11/2022-18:37:30] [I] Precision: FP32
[08/11/2022-18:37:30] [I] Calibration: 
[08/11/2022-18:37:30] [I] Refit: Disabled
[08/11/2022-18:37:30] [I] Sparsity: Disabled
[08/11/2022-18:37:30] [I] Safe mode: Disabled
[08/11/2022-18:37:30] [I] Restricted mode: Disabled
[08/11/2022-18:37:30] [I] Save engine: /home/jetson/yolov7/engine_yolov7
[08/11/2022-18:37:30] [I] Load engine: 
[08/11/2022-18:37:30] [I] NVTX verbosity: 0
[08/11/2022-18:37:30] [I] Tactic sources: Using default tactic sources
[08/11/2022-18:37:30] [I] timingCacheMode: local
[08/11/2022-18:37:30] [I] timingCacheFile: 
[08/11/2022-18:37:30] [I] Input(s)s format: fp32:CHW
[08/11/2022-18:37:30] [I] Output(s)s format: fp32:CHW
[08/11/2022-18:37:30] [I] Input build shapes: model
[08/11/2022-18:37:30] [I] Input calibration shapes: model
[08/11/2022-18:37:30] [I] === System Options ===
[08/11/2022-18:37:30] [I] Device: 0
[08/11/2022-18:37:30] [I] DLACore: 
[08/11/2022-18:37:30] [I] Plugins:
[08/11/2022-18:37:30] [I] === Inference Options ===
[08/11/2022-18:37:30] [I] Batch: Explicit
[08/11/2022-18:37:30] [I] Input inference shapes: model
[08/11/2022-18:37:30] [I] Iterations: 10
[08/11/2022-18:37:30] [I] Duration: 3s (+ 200ms warm up)
[08/11/2022-18:37:30] [I] Sleep time: 0ms
[08/11/2022-18:37:30] [I] Streams: 1
[08/11/2022-18:37:30] [I] ExposeDMA: Disabled
[08/11/2022-18:37:30] [I] Data transfers: Enabled
[08/11/2022-18:37:30] [I] Spin-wait: Disabled
[08/11/2022-18:37:30] [I] Multithreading: Disabled
[08/11/2022-18:37:30] [I] CUDA Graph: Disabled
[08/11/2022-18:37:30] [I] Separate profiling: Disabled
[08/11/2022-18:37:30] [I] Time Deserialize: Disabled
[08/11/2022-18:37:30] [I] Time Refit: Disabled
[08/11/2022-18:37:30] [I] Skip inference: Disabled
[08/11/2022-18:37:30] [I] Inputs:
[08/11/2022-18:37:30] [I] === Reporting Options ===
[08/11/2022-18:37:30] [I] Verbose: Disabled
[08/11/2022-18:37:30] [I] Averages: 10 inferences
[08/11/2022-18:37:30] [I] Percentile: 99
[08/11/2022-18:37:31] [I] Dump refittable layers:Disabled
[08/11/2022-18:37:31] [I] Dump output: Disabled
[08/11/2022-18:37:31] [I] Profile: Disabled
[08/11/2022-18:37:31] [I] Export timing to JSON file: 
[08/11/2022-18:37:31] [I] Export output to JSON file: 
[08/11/2022-18:37:31] [I] Export profile to JSON file: 
[08/11/2022-18:37:31] [I] 
[08/11/2022-18:37:31] [I] === Device Information ===
[08/11/2022-18:37:31] [I] Selected Device: Xavier
[08/11/2022-18:37:31] [I] Compute Capability: 7.2
[08/11/2022-18:37:31] [I] SMs: 6
[08/11/2022-18:37:31] [I] Compute Clock Rate: 1.109 GHz
[08/11/2022-18:37:31] [I] Device Global Memory: 7765 MiB
[08/11/2022-18:37:31] [I] Shared Memory per SM: 96 KiB
[08/11/2022-18:37:31] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/11/2022-18:37:31] [I] Memory Clock Rate: 1.109 GHz
[08/11/2022-18:37:31] [I] 
[08/11/2022-18:37:31] [I] TensorRT version: 8001
[08/11/2022-18:37:32] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 5390 (MiB)
[08/11/2022-18:37:32] [I] Start parsing network model
[08/11/2022-18:37:32] [I] [TRT] ----------------------------------------------------------------
[08/11/2022-18:37:32] [I] [TRT] Input filename:   /home/jetson/yolov7/best-yolov7nano.onnx
[08/11/2022-18:37:32] [I] [TRT] ONNX IR version:  0.0.6
[08/11/2022-18:37:32] [I] [TRT] Opset version:    12
[08/11/2022-18:37:32] [I] [TRT] Producer name:    pytorch
[08/11/2022-18:37:32] [I] [TRT] Producer version: 1.8
[08/11/2022-18:37:32] [I] [TRT] Domain:           
[08/11/2022-18:37:32] [I] [TRT] Model version:    0
[08/11/2022-18:37:32] [I] [TRT] Doc string:       
[08/11/2022-18:37:32] [I] [TRT] ----------------------------------------------------------------
[08/11/2022-18:37:32] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/11/2022-18:37:32] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[08/11/2022-18:37:32] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[08/11/2022-18:37:32] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[08/11/2022-18:37:32] [I] [TRT] Successfully created plugin: ScatterND
[08/11/2022-18:37:32] [E] Error[9]: [graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 9: Internal Error (Mul_286: broadcast dimensions must be conformable
)
[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:720: While parsing node number 286 [Mul -> "423"]:
[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:722: input: "420"
input: "1101"
output: "423"
name: "Mul_286"
op_type: "Mul"

[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:726: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Mul_286
[graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 9: Internal Error (Mul_286: broadcast dimensions must be conformable
)
[08/11/2022-18:37:32] [E] Failed to parse onnx file
[08/11/2022-18:37:32] [I] Finish parsing network model
[08/11/2022-18:37:32] [E] Parsing model failed
[08/11/2022-18:37:32] [E] Engine creation failed
[08/11/2022-18:37:32] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/yolov7/best-yolov7nano.onnx --saveEngine=/home/jetson/yolov7/engine_yolov7

AastaLLL · August 12, 2022, 2:47am

Hi,

Would you mind testing your model on a newer TensorRT?
For example, TensorRT 8.2 in JetPack 4.6.2 or TensorRT 8.4 in JetPack 5?

Thanks.

shahizat · August 12, 2022, 4:51am

Hi AastaLLL,

Thanks for your reply. I really appreciate it.

Time after time, problem after problem, currently, i am experiencing the another problem, when I want to upgrade from jetpack 4.6 to jetpack 4.6.2 after issuing sudo apt update using Debian packages method.

Err:15 https://repo.download.nvidia.com/jetson/common r32.7.1 Release
  404  Not Found [IP: 96.16.49.227 443]
Err:16 https://repo.download.nvidia.com/jetson/t194 r32.7.1 Release
  404  Not Found [IP: 96.16.49.227 443]
Reading package lists... Done
E: The repository 'https://repo.download.nvidia.com/jetson/common r32.7.1 Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'https://repo.download.nvidia.com/jetson/t194 r32.7.1 Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

shahizat · August 14, 2022, 10:29am

Hi AastaLLL, I somehow managed to update my jetpack, but it created other issues with rootfs, since i am booting my jetson from SSD. So, i am going to wait when the production release of jetpack 5.0 will be released. I guess next week.

kayccc · August 24, 2022, 2:02am

JetPack 5.0.2 GA release has been published, please help to open a new topic. Thanks

system · September 7, 2022, 2:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1378	July 12, 2022
Jetson Nano Python 3.7 version for Tensorrt Jetson Nano tensorrt , python	14	3689	April 12, 2023
TRT Fails to parse ONNX Model (yolov8 Segmentation) Jetson Xavier NX tensorrt , yolo , onnx	5	659	February 23, 2024
Convert YOLOv7 QAT model to TensorRT engine failure Jetson AGX Xavier yolo	9	1011	June 21, 2023
Yolo V4 on Jetson Nano with JP4.6 Jetson Nano yolo	2	1916	May 3, 2022
Yolov3 is very slow Jetson Nano	21	20123	October 14, 2021
Yolov5 + TensorRT Jetson Nano tensorrt , yolo	4	5078	April 29, 2022
Yolov5 + TensorRT results seems weird on Jetson Nano 4GB TensorRT	5	2025	January 24, 2022
Issue with yolov8s-seg conversion from onnx to engine Jetson AGX Orin yolo , onnx	4	19	October 29, 2024
TensorRT problem on NVIDIA APEX ORIN NX TensorRT tensorrt , jetson-inference , cudnn	1	22	August 29, 2024

Yolov6 Slow inference speed on the Nvidia Jetson NX board

Related Topics