Can I limit the computational resources consumption at the TensorRT engine building stage?

taoyi.wang · August 12, 2023, 3:20am

Description

Can I limit the computational resources consumption of each engine, including SM number, grid sizes, block sizes, memory limitations, etc., at the TensorRT engine building stage? My motivation is to run multiple TensorRT engine in parallel with multi-stream, and the kernels in each stream could be scheduled to be executed concurrently .

Environment

TensorRT Version: 8.0
GPU Type: RTX2080Ti, Jetson AGX Orin
CUDA Version: 11.x
Operating System + Version: Ubuntu 18, Ubuntu 20

AakankshaS · August 14, 2023, 4:37am

Hi,

This looks like a Jetson issue. Please refer to the below samples in case useful.

For any further assistance, we will move this post to to Jetson related forum.

Thanks!

taoyi.wang · August 15, 2023, 11:24am

Hi，
This is not only a jetson issue , it’s a common issue about TensorRT.
I want to limit the resource consumption of the TensorRT engine when building it on both server GPU and edge GPU.
Could you please help me, thank you

spolisetty · August 28, 2023, 4:50pm

Hi,

TensorRT typically optimizes grid and block sizes internally to maximize GPU throughput for inference. We can restrict the memory by setting the workspace. Please refer to the doc below for more details.

TensorRT allows you to specify different optimization profiles, each with its own batch size. This can be used to control the batch size, which can affect the GPU memory usage and computational load.

You can also control which GPU is used for inference by setting the CUDA device before creating the execution context.

For deploying the multiple engines, we recommend using the Triton inference server, which manages resources internally and gives the best performance.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

Thank you.

Topic		Replies	Views
Can I inference two engine simultaneous on jetson using TensorRT? Jetson TX2	4	2390	October 18, 2021
[QST] Best practices of handling competence of gpu resource? TensorRT tensorrt , cuda , jetson-inference , performance	2	559	September 20, 2021
TensorRT 8.6.1.6 more resource hungry than 7.X? TensorRT cudnn	1	279	June 10, 2024
Does tensorRT inference app eat cuda resources? TensorRT	4	572	May 3, 2023
Concurrent tensorRT engines TensorRT jetson	1	404	December 5, 2022
TRT concurrently Jetson TX2 tensorrt	7	1116	September 5, 2021
Multiple TensorRT models Inference on Jetson Orin Jetson AGX Orin tensorrt	2	285	April 24, 2024
TensorRT model consuming more amount of RAM Jetson TX2 tensorrt	3	894	October 18, 2021
Is it possible to run multiple TensorRT model inference on a GPU simultaneously and parallelly? TensorRT tensorrt , cuda	3	2032	August 23, 2022
TensorRT Python API builder build_engine faiure - Error Code 2: OutOfMemory (no further information) TensorRT	1	1010	March 24, 2022

Can I limit the computational resources consumption at the TensorRT engine building stage?

Description

Environment

Related topics