cuSparse performance for Pascal GPUs

Running through some applications which use cuSparse level 3 functions (for BSR format) and I am seeing a very large performance difference between the same application run on a GTX 1080 (compiled for 61) and run using a Maxwell GTX Titan X (compiled for 52). This is using CUDA 8.0 RC.

For a moderate size set of calls for cusparseCbsrsm2_analysis() and cusparseCbsrsm2_solve() the same application run on a Titan X is about 2-3 times faster than the exact same application run on the GTX 1080.

Have not profiled in detail yet to figure out if there is some strange overhead for the GTX 1080 like I observed using thrust, but will get to that soon.

Is this too early to make anything of this observation?

Will there be significant differences in cuSparse when using Pascal GPUs?

In general will there be any major upgrades to cuSparse which were not yet included in the CUDA 8.0 RC version?

It’s not too early. I would suggest filing a bug, including a complete code/test.