Performance Downgrade when changing [deprecated] cusparse<t>csrmm() to cusparseSpMM()

dleonard · August 20, 2019, 3:42pm

Dear NVIDIA developers,

I am working on the acceleration of a scientific codebase and currently I am using the cuSPARSE library to compute sparsedense and densesparse matrix-matrix multiplications. I recently started working with the updated CUDA 10.1 version and reading the documentation of cuSPARSE, I found out that the

cusparse<t>csrmm()

is deprecated and will be removed in a future release.

Naturally, I changed to the recommended

cusparseSpMM()

routine but noticed a substantial performance slow-down. I tried to isolate the specific issue and profiled the following code segment that multiplies a random sparse with a random dense matrix (both N by N).

Code below:

t1 = get_time(0.0);

	cusparse_state=cusparseZdense2csr((cusparseHandle_t)cusparse_handle, N, N, descrA, (cuDoubleComplex*) cA_dev, N, nnzPerRow_d, (cuDoubleComplex*) cA_nnz_d, cA_edgei_d, cA_indexj_d);

	cudaMemcpy(cA_edgei, cA_edgei_d, (N+1)*sizeof(int), cudaMemcpyDeviceToHost);

	cusparse_state = cusparseCreateCsr(&sparse_descriptor, N, N, *nnzTotalHostPtr, cA_edgei_d, cA_indexj_d, (cuDoubleComplex*) cA_nnz_d, CUSPARSE_INDEX_32I, CUSPARSE_INDEX_32I, CUSPARSE_INDEX_BASE_ZERO, CUDA_C_64F);
	cusparse_state = cusparseCreateDnMat(&dense_descriptor, N, N, N, (cuDoubleComplex*) cB_dev, CUDA_C_64F, CUSPARSE_ORDER_COL);
	cusparse_state = cusparseCreateDnMat(&denseC_descriptor, N, N, N, (cuDoubleComplex*) cC_cusparse, CUDA_C_64F, CUSPARSE_ORDER_COL);

	for(int k = 0; k<number_trials; k++){
		cusparse_state = cusparseSpMM((cusparseHandle_t)cusparse_handle, CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_NON_TRANSPOSE, (cuDoubleComplex*)&alpha, sparse_descriptor, dense_descriptor, (cuDoubleComplex*)&beta, denseC_descriptor, CUDA_C_64F, CUSPARSE_CSRMM_ALG1, NULL);

//		cusparse_state = cusparseZcsrmm((cusparseHandle_t)cusparse_handle, CUSPARSE_OPERATION_NON_TRANSPOSE, N, N, N, *nnzTotalHostPtr, (cuDoubleComplex*)&alpha, descrA, (cuDoubleComplex*)cA_nnz_d, cA_edgei_d, cA_indexj_d, (cuDoubleComplex*)cB_dev, N, (cuDoubleComplex*)&beta, (cuDoubleComplex*)cC_cusparse,N);

		cudaMemcpy(cC_cshost, cC_cusparse, N*N*sizeof(CPX), cudaMemcpyDeviceToHost);
	}

cudaMemcpy(cA_nnz, cA_nnz_d, *nnzTotalHostPtr*sizeof(CPX), cudaMemcpyDeviceToHost);
cudaMemcpy(cA_indexj, cA_indexj_d, *nnzTotalHostPtr*sizeof(int), cudaMemcpyDeviceToHost);

t1 = get_time(t1);

The code was compiled using the 19.4 PGI version of pgc++.

By changing the location of the comment I can profile either cusparsecsrmm() or cusparseSpMM().
The results were that the speed of cusparseSpMM() is about half of what cusparsecsrmm() gave me for double precision complex numbers, regardless of size or sparsity of the matrices.

My question is essentially whether or not I’m using cusparseSpMM() in a non-intended fashion and if that’s not the case why cusparsecsrmm() is even deprecated at all, especially given the fact that there already exist known issues with the cusparseSpMM() detailed in this forum post:

Robert_Crovella · August 20, 2019, 4:19pm

It’s not obvious to me that you are using cusparseSpMM in a non-intended fashion.

For the performance issue, my suggestion would be to file a bug. You will likely be asked for a complete test code.

Topic		Replies	Views
Massive performance decrease in cuSPARSE spMM from v10.1 to v10.2 CUDA Programming and Performance performance	1	665	January 22, 2021
Significant performance decrease from cusparseZ(D)csrmm2 to cusparseSpMM GPU-Accelerated Libraries	2	441	February 1, 2021
How to move from to cusparseScsrmm to cusparseSpMM? GPU-Accelerated Libraries cusparse	1	1712	March 7, 2022
How to replace cusparseScsrmm2 with cusparseSpMM CUDA Developer Tools	0	876	May 18, 2020
cuSPARSE performance question: csrmm CUDA Programming and Performance	0	740	December 17, 2015
Performance characteristics of cusparseSpMM GPU-Accelerated Libraries nvbugs	1	875	June 28, 2023
CuSparse Matrix Multiplication Fails Silently GPU-Accelerated Libraries cusparse	4	75	December 9, 2025
CuSPARSE MM Multiplication: Preprocess and SPMM_CSR_ALG3 Error nvc, nvc++ and nvfortran cuda	1	701	August 17, 2021
CUSPARSE_STATUS_INVALID_VALUE when using cusparseSpMM GPU-Accelerated Libraries	3	2392	July 14, 2019
CUSPARSE much slower than scipy.sparse? CUDA Programming and Performance	8	3338	January 16, 2017

Performance Downgrade when changing [deprecated] cusparse<t>csrmm() to cusparseSpMM()

Related topics