Yolov6 Slow inference speed on the Nvidia Jetson NX board

Hi Jetson community,

I changed the yolov6 code to be able to use my intel realsense camera as input source like image and video, however inference speed is very slow. Approximately 1-2 fps. Yolov5 and Yolov7 gives an average 20 fps without TensorRT optimization. What can be a reason for that?

Thanks in advance for your reply.

Best regards,
Shakhizat

Hi,

Which frameworks do you use?
In general, it’s recommended to deploy the DNN model with TensorRT for better performance.

Thanks.

1 Like

Hi AastaLL,

Thanks for your reply. I managed to fix. The issue was in the device parameter, which used only cpu to run inference.

So, I encountered with the another issue. I can not upgrade the TensorRT by using below command:

$ sudo echo "deb https://repo.download.nvidia.com/jetson/common r32.7 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
$ sudo echo "deb https://repo.download.nvidia.com/jetson/t194 r32.7 main" >> /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
$ sudo apt update
$ sudo apt install nvidia-tensorrt

Hi AasaLLL,

Do you know why the below error message appears, when I want to convert my onnx model to tensorrt engine?

&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/yolov7/best-yolov7nano.onnx --saveEngine=/home/jetson/yolov7/engine_yolov7
[08/11/2022-18:37:30] [I] === Model Options ===
[08/11/2022-18:37:30] [I] Format: ONNX
[08/11/2022-18:37:30] [I] Model: /home/jetson/yolov7/best-yolov7nano.onnx
[08/11/2022-18:37:30] [I] Output:
[08/11/2022-18:37:30] [I] === Build Options ===
[08/11/2022-18:37:30] [I] Max batch: explicit
[08/11/2022-18:37:30] [I] Workspace: 16 MiB
[08/11/2022-18:37:30] [I] minTiming: 1
[08/11/2022-18:37:30] [I] avgTiming: 8
[08/11/2022-18:37:30] [I] Precision: FP32
[08/11/2022-18:37:30] [I] Calibration: 
[08/11/2022-18:37:30] [I] Refit: Disabled
[08/11/2022-18:37:30] [I] Sparsity: Disabled
[08/11/2022-18:37:30] [I] Safe mode: Disabled
[08/11/2022-18:37:30] [I] Restricted mode: Disabled
[08/11/2022-18:37:30] [I] Save engine: /home/jetson/yolov7/engine_yolov7
[08/11/2022-18:37:30] [I] Load engine: 
[08/11/2022-18:37:30] [I] NVTX verbosity: 0
[08/11/2022-18:37:30] [I] Tactic sources: Using default tactic sources
[08/11/2022-18:37:30] [I] timingCacheMode: local
[08/11/2022-18:37:30] [I] timingCacheFile: 
[08/11/2022-18:37:30] [I] Input(s)s format: fp32:CHW
[08/11/2022-18:37:30] [I] Output(s)s format: fp32:CHW
[08/11/2022-18:37:30] [I] Input build shapes: model
[08/11/2022-18:37:30] [I] Input calibration shapes: model
[08/11/2022-18:37:30] [I] === System Options ===
[08/11/2022-18:37:30] [I] Device: 0
[08/11/2022-18:37:30] [I] DLACore: 
[08/11/2022-18:37:30] [I] Plugins:
[08/11/2022-18:37:30] [I] === Inference Options ===
[08/11/2022-18:37:30] [I] Batch: Explicit
[08/11/2022-18:37:30] [I] Input inference shapes: model
[08/11/2022-18:37:30] [I] Iterations: 10
[08/11/2022-18:37:30] [I] Duration: 3s (+ 200ms warm up)
[08/11/2022-18:37:30] [I] Sleep time: 0ms
[08/11/2022-18:37:30] [I] Streams: 1
[08/11/2022-18:37:30] [I] ExposeDMA: Disabled
[08/11/2022-18:37:30] [I] Data transfers: Enabled
[08/11/2022-18:37:30] [I] Spin-wait: Disabled
[08/11/2022-18:37:30] [I] Multithreading: Disabled
[08/11/2022-18:37:30] [I] CUDA Graph: Disabled
[08/11/2022-18:37:30] [I] Separate profiling: Disabled
[08/11/2022-18:37:30] [I] Time Deserialize: Disabled
[08/11/2022-18:37:30] [I] Time Refit: Disabled
[08/11/2022-18:37:30] [I] Skip inference: Disabled
[08/11/2022-18:37:30] [I] Inputs:
[08/11/2022-18:37:30] [I] === Reporting Options ===
[08/11/2022-18:37:30] [I] Verbose: Disabled
[08/11/2022-18:37:30] [I] Averages: 10 inferences
[08/11/2022-18:37:30] [I] Percentile: 99
[08/11/2022-18:37:31] [I] Dump refittable layers:Disabled
[08/11/2022-18:37:31] [I] Dump output: Disabled
[08/11/2022-18:37:31] [I] Profile: Disabled
[08/11/2022-18:37:31] [I] Export timing to JSON file: 
[08/11/2022-18:37:31] [I] Export output to JSON file: 
[08/11/2022-18:37:31] [I] Export profile to JSON file: 
[08/11/2022-18:37:31] [I] 
[08/11/2022-18:37:31] [I] === Device Information ===
[08/11/2022-18:37:31] [I] Selected Device: Xavier
[08/11/2022-18:37:31] [I] Compute Capability: 7.2
[08/11/2022-18:37:31] [I] SMs: 6
[08/11/2022-18:37:31] [I] Compute Clock Rate: 1.109 GHz
[08/11/2022-18:37:31] [I] Device Global Memory: 7765 MiB
[08/11/2022-18:37:31] [I] Shared Memory per SM: 96 KiB
[08/11/2022-18:37:31] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/11/2022-18:37:31] [I] Memory Clock Rate: 1.109 GHz
[08/11/2022-18:37:31] [I] 
[08/11/2022-18:37:31] [I] TensorRT version: 8001
[08/11/2022-18:37:32] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 5390 (MiB)
[08/11/2022-18:37:32] [I] Start parsing network model
[08/11/2022-18:37:32] [I] [TRT] ----------------------------------------------------------------
[08/11/2022-18:37:32] [I] [TRT] Input filename:   /home/jetson/yolov7/best-yolov7nano.onnx
[08/11/2022-18:37:32] [I] [TRT] ONNX IR version:  0.0.6
[08/11/2022-18:37:32] [I] [TRT] Opset version:    12
[08/11/2022-18:37:32] [I] [TRT] Producer name:    pytorch
[08/11/2022-18:37:32] [I] [TRT] Producer version: 1.8
[08/11/2022-18:37:32] [I] [TRT] Domain:           
[08/11/2022-18:37:32] [I] [TRT] Model version:    0
[08/11/2022-18:37:32] [I] [TRT] Doc string:       
[08/11/2022-18:37:32] [I] [TRT] ----------------------------------------------------------------
[08/11/2022-18:37:32] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/11/2022-18:37:32] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[08/11/2022-18:37:32] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[08/11/2022-18:37:32] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[08/11/2022-18:37:32] [I] [TRT] Successfully created plugin: ScatterND
[08/11/2022-18:37:32] [E] Error[9]: [graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 9: Internal Error (Mul_286: broadcast dimensions must be conformable
)
[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:720: While parsing node number 286 [Mul -> "423"]:
[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:722: input: "420"
input: "1101"
output: "423"
name: "Mul_286"
op_type: "Mul"

[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[08/11/2022-18:37:32] [E] [TRT] ModelImporter.cpp:726: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Mul_286
[graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 9: Internal Error (Mul_286: broadcast dimensions must be conformable
)
[08/11/2022-18:37:32] [E] Failed to parse onnx file
[08/11/2022-18:37:32] [I] Finish parsing network model
[08/11/2022-18:37:32] [E] Parsing model failed
[08/11/2022-18:37:32] [E] Engine creation failed
[08/11/2022-18:37:32] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/yolov7/best-yolov7nano.onnx --saveEngine=/home/jetson/yolov7/engine_yolov7

Hi,

Would you mind testing your model on a newer TensorRT?
For example, TensorRT 8.2 in JetPack 4.6.2 or TensorRT 8.4 in JetPack 5?

Thanks.

Hi AastaLLL,

Thanks for your reply. I really appreciate it.

Time after time, problem after problem, currently, i am experiencing the another problem, when I want to upgrade from jetpack 4.6 to jetpack 4.6.2 after issuing sudo apt update using Debian packages method.

Err:15 https://repo.download.nvidia.com/jetson/common r32.7.1 Release
  404  Not Found [IP: 96.16.49.227 443]
Err:16 https://repo.download.nvidia.com/jetson/t194 r32.7.1 Release
  404  Not Found [IP: 96.16.49.227 443]
Reading package lists... Done
E: The repository 'https://repo.download.nvidia.com/jetson/common r32.7.1 Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'https://repo.download.nvidia.com/jetson/t194 r32.7.1 Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

Hi AastaLLL, I somehow managed to update my jetpack, but it created other issues with rootfs, since i am booting my jetson from SSD. So, i am going to wait when the production release of jetpack 5.0 will be released. I guess next week.

JetPack 5.0.2 GA release has been published, please help to open a new topic. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.