TensorRT Inference Consuming Large Amount of System Resources

My project uses a TensorRT neural net, and its resource utilization seems to be unexpectedly large. Specifically, checking GPU utilization with nvidia-smi and CPU utilization with top, both CPU and GPU utilization are pretty high.

The models used were created/trained using PyTorch, exported to the ONNX format and mapped to TRT using trtexec.

The process running the model also runs CUDA kernels for other processing. A side by side comparison of the process with and without reveals the following utilization:

With model inference,
top shows CPU utilization of ~44% for the process, and nvidia-smi shows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:61:00.0 Off |                    0 |
| N/A   36C    P0    84W / 300W |   9324MiB / 32768MiB |     52%      Default |
|                               |                      |                  N/A |

Without model inference,
top shows a CPU utilization of ~5% and nvidia-smi shows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:61:00.0 Off |                    0 |
| N/A   34C    P0    53W / 300W |   9312MiB / 32768MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Not included here is a profile done with nsys. The profile seemed to contain CUDA kernel calls done by TensorRT. On that note I have a number of specific questions related to profiling TensorRT models:

  1. Why does nsys label function calls done by TensorRT as CUDA kernel calls? My understanding of TensorRT is that all execution is done on Tensor cores and not CUDA cores.

  2. Does nvidia-smi show CUDA core + Tensor core utilization % ?

  3. Is there any tool that can show utilization differentiated by core type (tensor versus cuda core utilization)?

  4. Most concerning is the very high CPU utilization. Is this normal for TensorRT models?

Environment

**TensorRT Version: 8.2.3
**GPU Type: Tesla V100
**Nvidia Driver Version: 11.5.119
**CUDA Version: 11.6
**Operating System + Version: CentOS 7
**Python Version (if applicable): 3,6
**PyTorch Version (if applicable): 1.10.1+cu113
**Baremetal or Container (if container which image + tag): Baremetal

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!