A clear and concise description of the bug or issue.
Nvidia Driver Version:
Operating System + Version:
Ubuntu 20.04 x86_64
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered
When running inference program using TensorRT, does the TensorRT lib eat FP32/INT32 HW resource?
Or it only eat TENSOR CORE resource? (above picture showed)
Thank you very much!
Is there something you’re trying to achieve? Or is it just curiosity?
I believe TensorRT will use whatever resources it can to maximize the inference speed of your layers. So it really depends. Maybe if certain precisions are enabled/disabled, something more concrete could be said.
Maybe you’re hoping to share resources with another task? My impression is you’re better off assuming TensorRT is going to try to maximize your GPU usage while it’s running and there’s no room for another task, but I won’t say that for certain…
Thanks， just want to know inside tensorRT.
if I run 4 tensorRT models, the cuda cores and tensor cores are serial or parallel running.
or tensorRT libs averaged assign hw resources for each model, parallel running.
When you build the tensorrt engine, it will maximize inference speed, which means it will probably use as many resources as possible. This means if you try to run 4 models in parallel, they effectively will run in serial (this is unless your model is small and can’t use all resources). Don’t expect resources to be assigned differently at runtime, it’s all fixed when the engine is built as far as I know.
Got it，thank you very much!