Currently, I am performing performance testing of Inference using TensorRT. I have also applied MPS for further optimization.
Prior to converting with TensorRT, applying MPS showed an improvement of about 20-30% in FPS. However, after converting with TensorRT, applying MPS yields almost similar performance.
I am curious if there are any similar cases or reasons for this.
I wonder if using nvidia-smi is a reliable way to measure GPU utilization when examining GPU usage. Additionally, I would like to know if there is any command provided by NVIDIA to obtain information about CUDA cores or Tensor Cores on the GPU.
Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation
Also, request you to share your model and script if not shared already so that we can help you better.
Meanwhile, for some common errors and queries please refer to below link:
What I am curious about is whether it is correct to consider GPU utilization and the utilization reported by nvidia-smi as the same thing. I have doubts because it seems that even when the CUDA cores of the GPU are not being fully utilized, the GPU utilization still shows 100%.