My first suggestion is to switch to the latest version https://developer.nvidia.com/cusparselt-downloads. Second, regarding kernel execution times, please note that in some cases the tuning routine may evaluate the two kernel mode, while Nsight Compute reports a single kernel evaluation. This can lead to misleading results.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| cusparseLtMatmul is slower than cublasGemmEx | 0 | 632 | April 21, 2023 | |
| Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt | 10 | 1225 | March 14, 2022 | |
| Sparse Matrix-Vector Multiplication on CUDA | 79 | 313667 | November 22, 2010 | |
| multi-threading with cuSPARSE lib | 15 | 1333 | November 10, 2017 | |
| why cusparse is just 2x faster than mkl | 1 | 1069 | December 20, 2011 | |
| Performance characteristics of cusparseSpMM | 1 | 768 | June 28, 2023 | |
| Help Improving Performance using cuSolver/cuSparse Routines | 0 | 715 | December 15, 2023 | |
| Sparse Matrix-Matrix Multiplication | 14 | 18178 | June 15, 2010 | |
| Tesla C2050 (Fermi) benchmarking results | 18 | 8681 | September 22, 2010 | |
| slow performance cusparse spmv | 14 | 2918 | December 9, 2013 |