Calling cuSparse library on Tesla A100 with CUDA11.1 is much slower than that on Tesla P100 with CUDA9.0

Hello, guys!

Recently, I’m using NVIDIA GPU to solve large sparse matrix linear equations by calling cuSparse library and testing the solution speed on different GPUs. Unfortunately, I have encountered an incredible problem. According to the instructions given by NVIDIA,the A100 has more powerful computing power than P100,but, in the process of my testing, the solution speed of A100 is four times slower than that of P100,which is quite puzzling.

In the two picture below, I gave the result of the test.
In Figure 1, A100 calls the function ‘csrsv2_solve_lower_nontrans_byLevel_kernel’ consuming 3348.9ms, while P100P100 takes 831.05ms in Figure 2.

So what is the reason for this problem? Please give me some help and guidance, thank you very much!

ZJQ007


Figure 1 Nvprof analysis results of A100


Figure 2 Nvprof analysis results of P100