Does tensorRT inference app eat cuda resources?

ming.xu4 · April 28, 2023, 1:26am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version:
8.4.0
GPU Type:
Ampere Arch
Nvidia Driver Version:
470.82.01
CUDA Version:
11.4
CUDNN Version:
N/A
Operating System + Version:
Ubuntu 20.04 x86_64
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

When running inference program using TensorRT, does the TensorRT lib eat FP32/INT32 HW resource?
Or it only eat TENSOR CORE resource? (above picture showed)
Thank you very much!

adam.alcolado · April 30, 2023, 4:28pm

Is there something you’re trying to achieve? Or is it just curiosity?

I believe TensorRT will use whatever resources it can to maximize the inference speed of your layers. So it really depends. Maybe if certain precisions are enabled/disabled, something more concrete could be said.

Maybe you’re hoping to share resources with another task? My impression is you’re better off assuming TensorRT is going to try to maximize your GPU usage while it’s running and there’s no room for another task, but I won’t say that for certain…

ming.xu4 · May 1, 2023, 12:11pm

Thanks， just want to know inside tensorRT.
for example:
if I run 4 tensorRT models, the cuda cores and tensor cores are serial or parallel running.
or tensorRT libs averaged assign hw resources for each model, parallel running.

adam.alcolado · May 2, 2023, 6:16pm

When you build the tensorrt engine, it will maximize inference speed, which means it will probably use as many resources as possible. This means if you try to run 4 models in parallel, they effectively will run in serial (this is unless your model is small and can’t use all resources). Don’t expect resources to be assigned differently at runtime, it’s all fixed when the engine is built as far as I know.

ming.xu4 · May 3, 2023, 6:45am

Got it，thank you very much!

Topic		Replies	Views
Resource allocation using tensorrt TensorRT	0	488	September 17, 2019
Does TensorRT uses multiple cores for inference for an input? TensorRT	2	621	October 12, 2021
the inference time increases linearly when running more than 2 tensorrt instance on single GPU TensorRT	1	1611	April 4, 2019
Adding multiple inference on TensorRT (Invalid Resource Handle Error) TensorRT	2	1745	December 4, 2019
I found that using tensorrt for inference takes more time than using tensorflow directly on GPU TensorRT	1	787	April 9, 2019
Tensor RT server with GPU only instances high CPU usage Triton Inference Server (archived)	4	2610	February 27, 2020
How to run multi trt model instance in single gpu efficentilly? TensorRT	0	706	June 20, 2019
Does TensorRT support multi-gpu inference? TensorRT	2	1206	June 25, 2018
Why tensorRT occupy many memory ? Jetson TX2	9	3925	May 12, 2021
TensorRT batch inference - How to be sure one kernel does use all the GPU ressources? TensorRT tensorrt , nsight	3	847	May 18, 2021

Does tensorRT inference app eat cuda resources?

Description

Environment

Relevant Files

Steps To Reproduce

Related topics