TensorRT Inference Consuming Large Amount of System Resources

user29588 · July 5, 2022, 5:43pm

My project uses a TensorRT neural net, and its resource utilization seems to be unexpectedly large. Specifically, checking GPU utilization with nvidia-smi and CPU utilization with top, both CPU and GPU utilization are pretty high.

The models used were created/trained using PyTorch, exported to the ONNX format and mapped to TRT using trtexec.

The process running the model also runs CUDA kernels for other processing. A side by side comparison of the process with and without reveals the following utilization:

With model inference,
top shows CPU utilization of ~44% for the process, and nvidia-smi shows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:61:00.0 Off |                    0 |
| N/A   36C    P0    84W / 300W |   9324MiB / 32768MiB |     52%      Default |
|                               |                      |                  N/A |

Without model inference,
top shows a CPU utilization of ~5% and nvidia-smi shows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:61:00.0 Off |                    0 |
| N/A   34C    P0    53W / 300W |   9312MiB / 32768MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Not included here is a profile done with nsys. The profile seemed to contain CUDA kernel calls done by TensorRT. On that note I have a number of specific questions related to profiling TensorRT models:

Why does nsys label function calls done by TensorRT as CUDA kernel calls? My understanding of TensorRT is that all execution is done on Tensor cores and not CUDA cores.
Does nvidia-smi show CUDA core + Tensor core utilization % ?
Is there any tool that can show utilization differentiated by core type (tensor versus cuda core utilization)?
Most concerning is the very high CPU utilization. Is this normal for TensorRT models?

Environment

**TensorRT Version: 8.2.3
**GPU Type: Tesla V100
**Nvidia Driver Version: 11.5.119
**CUDA Version: 11.6
**Operating System + Version: CentOS 7
**Python Version (if applicable): 3,6
**PyTorch Version (if applicable): 1.10.1+cu113
**Baremetal or Container (if container which image + tag): Baremetal

NVES · July 5, 2022, 6:07pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Topic		Replies	Views
GPU Utilization TensorRT tensorrt	3	942	August 29, 2023
The TensorRT inference API consumes more CPU resources（Jetson Xavier NX） Jetson Xavier NX tensorrt , cudnn	8	204	February 13, 2025
How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc TensorRT cudnn	4	2255	January 31, 2024
GPU power is maxout then inference is running with tensorrt TensorRT	4	547	December 19, 2022
TensorRT model consuming more amount of RAM on Jetson TX2 Jetson TX2 tensorrt	4	1224	August 26, 2020
High GPU usage during simple model inference on Tensorrt TensorRT cudnn	2	184	June 20, 2025
Does tensorRT inference app eat cuda resources? TensorRT	4	699	May 3, 2023
Constant GPU inference power Jetson Nano power , gpu	11	1374	June 1, 2021
Is there any other way besides TensorRT to increase the GPU utilization while doing the inference? TensorRT	1	796	March 25, 2019
Batch inference on tensorrt TensorRT tensorrt	4	504	February 15, 2021

TensorRT Inference Consuming Large Amount of System Resources

Environment

check_model.py

Related topics