Performance comparing between different CUDA Toolkit versions and GPU devices

jacob39 · September 23, 2021, 6:35am

I have two GPUs. One is the older one GeForce GTX Titan Black released in 2014, the other one is GeForce RTX 3070.
When I developed the project with Titan Black, I used CUDA functions in the toolkit version 8. Now I have a newer GPU, the running time of the project becomes faster as my expectation.

I am wondering if there will be a vast improvement on speed if I change the CUDA Toolkit 8 to11 and also adopt the new function in the CUDA 11?
If it does, I want to know the reasons. Does the improvement coming from the GPU architecture evolution or the toolkit algorithm itself? How much improvement is affected by the GPU architecture and toolkit version?

More information about my usage:
What my project do with CUDA Toolkit is trying to solve the energy function. I have a sparse matrix with dimension 10kx10k.
The functions I used in the CUDA Toolkit 8 are mainly cusolverSpXcsrcholAnalysis, cusolverSpScsrcholFactor and cusolverSpScsrcholSolve.
Since CUDA Toolkit 10.0, the function cusolverSpScsrlsvchol seems to complete all of the work in the above 3 functions.
I think I will replace a lot of old functions with the newer CUDA Toolkit for performance, please give me some explanations and correct me if I am wrong.

njuffa · September 23, 2021, 6:55am

Maybe. Maybe not. We don’t know the definition of “vast” as used here. The sure way to find out is to simply give it a try.

There are improvements in the hardware (GPU cores, memory technology, interconnects) and software (algorithms, compiler code generation, libraries). All can have an impact on performance. How large the overall impact for a particular use case is going to be is difficult to predict. It is even harder to accurately attribute performance changes to the individual components of the HW/SW ecosystem.

The traditional approach is to benchmark the specific cases you are interested in, and track the performance across CUDA versions and hardware generations. If you need to dig deeper, benchmarking a matrix of combinations of HW and SW may help, as might doing some roofline analysis. The CUDA profiler can probably help with investigating the impact of certain aspects.