Does cusolver for sparse cholesky necessarily slower than single-thread CPU?

In my case, solving a linear Ax=b system where A is a 30000*30000 symmetric (where the CSC representation has the same vectors as CSR) sparse matrix with at most 13k nnzs, is AT LEAST 10 times slower than even a single-thread laptop CPU solver.

I use RTX 2080 runs at 1.9GHz and the core utilization is near 99%. CPU I use is a laptop i7-9750h runs at 2.6GHz.

In both case I prefactorized (numerical analysis) once, and then directly solve for 20 times. RTX 2080 needs 50-55ms for each consecutive cholesky solving process, and in total 1.1s. CPU solve it at most 0.07s in single thread when hyperthreading is activated, which should be way slower.

My professor suggests me higher end GPU, but I really don’t think at least 10x performance will be gained.

Hi ,Just out of curiosity, how did you manage the data transfer between host and device ? Do you also transfer data for 20 times or just 1 time ?

Just 1 time, and even if I transfer 20 times the total transfer time is way small than just one solve of already factorized.

I move back to CUDA 7.5 and is at least twice as fast as CUDA 11.0, strange though still way too slower than CPU as a joke.