I have the same software and workflow run on multiple workstations and laptops. I am experiencing large GPU idle time (around 100ms) when running on Quadro RTX 4000. But I haven’t seen this problem on a workstation with 4 RTX 3070 GPUs or a laptop with a single RTX 2060 GPU. Attached are the profiling reports from Nsight Systems. I am running on Cuda toolkit v10.2.
Below are my questions:
- is this issue caused by cufft memory allocation?
- if so, why memory allocation is so much slower on Quadro RTX 4000?
- how to get around this problem?
- For Quadro RTX 4000, the performance is much worse when I switch to cuda toolkit 11.5 while other GPUs are working properly. Any reason behind this?
quadro_rtx4000_cufft.pptx (524.4 KB)
Have you compared with different driver versions?
Are you experiencing this during plan creation?
I don’t know how to check the cuda driver version. But I install the cuda driver that comes with the toolkit 10.2 and I assume that shouldn’t be an issue.
I believe that the malloc is from cufftPlanMany. Is it possible I create this plan outside this loop and reuse the plans just like other memory buffers?
To get driver version, SO answer
Is it possible I create this plan outside this loop and reuse the plans just like other memory buffers?
Yes, not only is it possible, it’s preferred for performance. Please see link
Thanks for the information,
I will try your recommendation.
Verified the solution and it works. Thanks,
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.