cuSparse performance issue after CUDA Toolkit 9.0: cuSparse<t>csrsv_analysis() 20 times slower

There is a bug in regarding a huge performance loss in cuSparsecsrsv_analysis() in CUDA 9.0 and CUDA 9.1 vs 8.0. It is 20 times slower than the earlier CUDA Toolkit, just running the same Sample code “conjugateGradientPrecond” on same GPU for a matrix sufficiently large enough (changed the triadiagonal matrix size to

M = N = 1638400;

and maximum number of iterations to:

const int max_iter = 10000;

All tests were conducted on a 1070 GTX running both under Linux (Centos 7.4.1708) and Windows 10, 64-bit with latest drivers.