Jetson AGX tensorrt inference latency very high after upgrade the kernel from 32.7.2 to 32.7.4

Wilbur · November 27, 2024, 7:31am

My device is Jetson AGX, jetpack 4.6,teensorrt8.2.
My question is why tensorrt runs with very high latency after I upgraded my kernel from 32.7.2 to 32.7.4.

before I upgrade the kernel(32.7.2),
I run the tensorrt inference CPP demo app. I got the below profiler.

[2024-11-25 03:50:59.619] [info] Trt: detection inference BatchCount: 15455
[2024-11-25 03:50:59.619] [info] |_ Preprocess: 0 ms/batch
[2024-11-25 03:50:59.619] [info] |_ CopyInput: 0 ms/batch
[2024-11-25 03:50:59.619] [info] |_ SetInferInputDims: 0 ms/batch
[2024-11-25 03:50:59.619] [info] |_ SetOptimization: 0 ms/batch
[2024-11-25 03:50:59.619] [info] |_ SetBindingDimensions: 0 ms/batch
[2024-11-25 03:50:59.619] [info] |_ Enqueue: 9 ms/batch
[2024-11-25 03:50:59.619] [info] |_ CopyOutput: 0 ms/batch
[2024-11-25 03:50:59.619] [info] |_ Postprocess: 3 ms/batch

after I upgraded the kernel from 32.7.2 to 32.7.4, the same app same input but the latency was very high.
the tensorrt API “context->setBindingDimensions(0, inferInputDims);” is very slow.
why ??? the context->setBindingDimensions(0, inferInputDims);running so slow ?What kind of resources does it take up? What resources does it require to operate?

[2024-11-26 11:30:49.650] [info] Trt: detection inference BatchCount: 13916
[2024-11-26 11:30:49.650] [info] |_ Preprocess: 0 ms/batch
[2024-11-26 11:30:49.650] [info] |_ CopyInput: 3 ms/batch
[2024-11-26 11:30:49.650] [info] |_ SetInferInputDims: 0 ms/batch
[2024-11-26 11:30:49.650] [info] |_ **SetOptimization: 87 ms/batch**
[2024-11-26 11:30:49.650] [info] |_ SetBindingDimensions: 0 ms/batch
[2024-11-26 11:30:49.650] [info] |_ Enqueue: 21 ms/batch
[2024-11-26 11:30:49.650] [info] |_ CopyOutput: 0 ms/batch
[2024-11-26 11:30:49.650] [info] |_ Postprocess: 4 ms/batch

AastaLLL · November 27, 2024, 8:30am

Hi,

Which device do you use?
AGX Orin doesn’t support r32 BSP.

Thanks.

Wilbur · November 27, 2024, 8:35am

Jetson Xavier AGX

AastaLLL · November 28, 2024, 6:34am

Hi,

Is there any difference in the source you execute?

Both r32.7.2 and r32.7.4 are using TensorRT 8.2.1.
There is no difference in the TensorRT library.

Thanks.

Wilbur · November 28, 2024, 6:40am

I run the same demo and inputs on the same tensorrt version.
but the kernel version is not the same; I am curious whether the kernel imports some bugs that affect the tensorrt inference.

Wilbur · November 28, 2024, 6:41am

why ??? the context->setBindingDimensions(0, inferInputDims); running so slow ?What kind of resources does it take up? What resources does it require to operate?

Wilbur · December 2, 2024, 2:39am

hi, any comments on this?

AastaLLL · December 2, 2024, 6:49am

Hi,

The function is just set up a parameter that should be fast.
Based on the results you provided, we don’t see obvious latency degradation of the SetBindingDimensions.

Could you double-check it?

r32.7.2

[2024-11-25 03:50:59.619] [info] |_ SetBindingDimensions: 0 ms/batch

r32.7.4.

[2024-11-26 11:30:49.650] [info] |_ SetBindingDimensions: 0 ms/batch

Thanks.

Wilbur · December 2, 2024, 9:36am

thanks for the reply.

Wilbur · December 3, 2024, 9:31am

sorry, I mean the “context->setOptimizationProfileAsync(0, (*cudaStreamsArray)[threadIndex]);” is much slow on the 32.7.4 than 32.7.2

AastaLLL · December 4, 2024, 7:01am

Hi,

The API is related to optimization.
Based on the document, the function may take time if some resource allocation is required.
https://developer.nvidia.com/docs/drive/drive-os/archives/6.0.4/tensorrt/api-reference/docs/classnvinfer1_1_1_i_execution_context.html#a74c361a3d93e70a3164988df7d60a4cc

This function will trigger layer resource updates on the next call of enqueueV2/executeV2, possibly resulting in performance bottlenecks.

Do you change the input size across different batches?
If not, this function should only be required at the initial phase.

Thanks.

system · January 1, 2025, 12:51am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tensorflow running very slow on Nvidia Jetson AGX Orin Jetson AGX Orin tensorflow	3	61	March 4, 2025
Running Inference on AGX GPU Jetson AGX Orin tensorrt	7	959	July 4, 2024
TensorRT 8.6 Performance Issue in AGX Orin 32Gb Jetson AGX Orin tensorrt	9	447	February 27, 2024
Perfomances drop after AGX Orin update Jetson AGX Orin cudnn	7	97	March 28, 2025
Inference time changes after training TensorRT tensorrt	5	584	September 25, 2020
Jetson AGX Xavier shows unstable inference time Jetson AGX Xavier tensorrt , jetson-inference	6	706	October 18, 2021
TensorRT quantization bug on Jetpack 6.0 Jetson AGX Orin tensorrt , pytorch	6	629	January 22, 2024
Jetson AGX Xavier Deep Learning Inference Benchmarks Jetson AGX Xavier	17	7695	June 15, 2021
Time of inference in FP16 and FP32 is the same Jetson TX2 tensorrt	20	1737	August 10, 2022
TensorRT examples TensorRT tensorrt , cuda , tensorflow , cudnn , tensorrt-model-optimizer	1	41	February 28, 2025

Jetson AGX tensorrt inference latency very high after upgrade the kernel from 32.7.2 to 32.7.4

r32.7.2

r32.7.4.

Related topics