Does tensorRT inference app eat cuda resources?

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version:
8.4.0
GPU Type:
Ampere Arch
Nvidia Driver Version:
470.82.01
CUDA Version:
11.4
CUDNN Version:
N/A
Operating System + Version:
Ubuntu 20.04 x86_64
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

When running inference program using TensorRT, does the TensorRT lib eat FP32/INT32 HW resource?
Or it only eat TENSOR CORE resource? (above picture showed)
Thank you very much!

Is there something you’re trying to achieve? Or is it just curiosity?

I believe TensorRT will use whatever resources it can to maximize the inference speed of your layers. So it really depends. Maybe if certain precisions are enabled/disabled, something more concrete could be said.

Maybe you’re hoping to share resources with another task? My impression is you’re better off assuming TensorRT is going to try to maximize your GPU usage while it’s running and there’s no room for another task, but I won’t say that for certain…

Thanks, just want to know inside tensorRT.
for example:
if I run 4 tensorRT models, the cuda cores and tensor cores are serial or parallel running.
or tensorRT libs averaged assign hw resources for each model, parallel running.

When you build the tensorrt engine, it will maximize inference speed, which means it will probably use as many resources as possible. This means if you try to run 4 models in parallel, they effectively will run in serial (this is unless your model is small and can’t use all resources). Don’t expect resources to be assigned differently at runtime, it’s all fixed when the engine is built as far as I know.

1 Like

Got it,thank you very much!