I’m just trying to figure out why same c++ program runing in rtx30 occupy more cpu ram than in rtx10? Is there any method can solve very large cpu ram usage problem in rtx30 on windows?
I still face the same issue when test on latest TRT version 8.2 EA. I also test in rtx3060 and get almost same problem. I also changed the cuda 11.1 to cuda 11.4, nothing improved. The below is program verbose output.
We have developed more kernels for Ampere GPUs. Some of the memory is consumed by cudnn and other libs like cublas. We also need more memory on newer GPU.
Based on the above screenshots looks like cuBLAS,cuDNN is consuming high CPU memory.
Hi spolisetty:
Thanks a lots. From the verbose output it indeed show that the cuBLAS,cuDNN and CUDA initialization takes much more CPU memory. Much more CPU occupation on 30series GPU really cause some trouble on deploying our program. I will keep looking for better solutions.