cusparseSpMM fp32 is slower than cublas cublasSgemm

Hi

I run the cusparse and cublas code from CUDALibrarySamples
The envs are as following:

  • cuda11-7
  • RTX3080
  • m,n,k are all 1024
  • Matrix’s saprsity is 50%

cusparseSpMM:3.22ms

// execute SpMM
  CHECK_CUSPARSE( cusparseSpMM(handle,
                 CUSPARSE_OPERATION_TRANSPOSE,
                 CUSPARSE_OPERATION_NON_TRANSPOSE,
                 &alpha, matA, matB, &beta, matC, CUDA_R_32F,
                 CUSPARSE_SPMM_ALG_DEFAULT, dBuffer) )

cublasSgemm:0.05018ms

CUBLAS_CHECK(
        cublasSgemm(cublasH, transa, transb, m, n, k, &alpha, d_A, lda, d_B, ldb, &beta, d_C, ldc));

CUDA Kernel

CUDA Kernel Statistics:

 Time (%)  Total Time (ns)  Instances   Avg (ns)     Med (ns)    Min (ns)   Max (ns)   StdDev (ns)                                                  Name                                                
 --------  ---------------  ---------  -----------  -----------  ---------  ---------  -----------  ----------------------------------------------------------------------------------------------------
     75.7        3,181,935          1  3,181,935.0  3,181,935.0  3,181,935  3,181,935          0.0  void cusparse::cusparseCooMMSmallKernel<(unsigned int)128, (cusparseOperation_t)1, (cusparseOperati…
     24.3        1,020,240          1  1,020,240.0  1,020,240.0  1,020,240  1,020,240          0.0  void gemm_kernel2x1_core<float, (bool)1, (bool)0, (bool)0, (bool)0, (bool)0>(T1 *, const T1 *, cons…

The code is as follwoing. Please let me know if there is something wrong in the code.
cuSPARSE_mm_test.zip (3.7 KB)

And another question is, where is the sample or docs to test 2:4 sparsity?

Thanks in advance~

Usually cusparse should be considered when matrix sparsity is 1% or less. The algorithm cusparse uses is not efficient compared to cublas, when nearly all the data has to be accessed.

On this forum, please post code inline, not via attachment.

I don’t think there is a sample code to test every imaginable parameter combination in the CUBLAS and CUSPARSE APIs. You cand find sample codes for these libraries on github.

Thanks for your reply!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.