Tensorrt take much more cpu ram in RTX3070

Description

I run sample mnist in RTX1060ti it just takes 789.4MB cpu memory, but in RTX3070 it takes 2406.2MB cpu memory.

Environment

TensorRT Version: 7.2.3.4
GPU Type: RTX1060TI, RTX3070
Nvidia Driver Version:
CUDA Version: 11.1
Operating System + Version: windows 10

Relevant Files

Hi,
Please refer to the below link for Sample guide.
https://docs.nvidia.com/deeplearning/tensorrt/sample-support-guide/index.html
Refer to the installation steps from the link if in case you are missing on anything
https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html
However suggested approach is to use TRT NGC containers to avoid any system dependency related issues.
https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt

In order to run python sample, make sure TRT python packages are installed while using NGC container.
/opt/tensorrt/python/python_setup.sh

In case, if you are trying to run custom model, please share your model and script with us, so that we can assist you better.
Thanks!

Hi,

  1. I follow instructions for installing TensorRT from a zip package on Windows 10.

Installation Guide :: NVIDIA Deep Learning TensorRT Documentation
2. I follow Running C++ Samples on Windows and README.md in sampleMNIST. Finally I can run the executable directly and through Visual Studio.
Sample Support Guide :: NVIDIA Deep Learning TensorRT Documentation
3. I am not familiar with TRT NGC,but I think using NGC will make TRT c++ program deplpoyment on windows more complicated and I find that NGC do not support Microsoft Windows.
Frequently Asked Questions · NVIDIA/nvidia-docker Wiki · GitHub

I’m just trying to figure out why same c++ program runing in rtx30 occupy more cpu ram than in rtx10? Is there any method can solve very large cpu ram usage problem in rtx30 on windows?

Thanks

Could you please confirm, are you facing the same issue on latest TRT version 8.2 EA

I still face the same issue when test on latest TRT version 8.2 EA. I also test in rtx3060 and get almost same problem. I also changed the cuda 11.1 to cuda 11.4, nothing improved. The below is program verbose output.




The attachment is the program I compile with vs2017 from \TensorRT-8.2.0.6\samples\sampleMNIST

Hi,

We have developed more kernels for Ampere GPUs. Some of the memory is consumed by cudnn and other libs like cublas. We also need more memory on newer GPU.
Based on the above screenshots looks like cuBLAS,cuDNN is consuming high CPU memory.

Moving post to cuBLAS tag to get better help on the memory management.

Hi spolisetty:
Thanks a lots. From the verbose output it indeed show that the cuBLAS,cuDNN and CUDA initialization takes much more CPU memory. Much more CPU occupation on 30series GPU really cause some trouble on deploying our program. I will keep looking for better solutions.

1 Like