There is a bug in regarding a huge performance loss in cuSparsecsrsv_analysis() in CUDA 9.0 and CUDA 9.1 vs 8.0. It is 20 times slower than the earlier CUDA Toolkit, just running the same Sample code “conjugateGradientPrecond” on same GPU for a matrix sufficiently large enough (changed the triadiagonal matrix size to
M = N = 1638400;
and maximum number of iterations to:
const int max_iter = 10000;
All tests were conducted on a 1070 GTX running both under Linux (Centos 7.4.1708) and Windows 10, 64-bit with latest drivers.