Problems in the use of cusparseSpGEMM in CUDA Fortran

H-POTATO · September 22, 2023, 7:59am

I am trying to solve a problem that requires a sparse matrix sparse matrix product in CUDA Fortran code.
I am trying to use the cusparse library, cusparseSpGEMM, by referring to the sample code on github (https://github.com/NVIDIA/CUDALibrarySamples/blob/master/cuSPARSE/spgemm/spgemm_example.c), but a problem has arisen.

The first problem is that the first time I do cusparseSpGEMM_workEstimation, the status becomes 7 (CUSPARSE_STATUS_INTERNAL_ERROR).

Here is my code.
The computational environment is A100 80GB with CUDA 11.0.
I would appreciate it if you could point out any problems.
Thanks.

=========================================================================

program SpGEMM

use cudafor
use cusparse

Implicit none

  !!Define Matrix----------------------------------
  Integer,parameter :: A_rows=4
  Integer,parameter :: A_cols=4
  Integer,parameter :: A_nnz=9
  Integer           :: Arow(A_rows+1)
  Integer           :: Acol(A_nnz)
  Real(8)           :: Aval(A_nnz)
  Integer,device    :: Arow_d(A_rows+1)
  Integer,device    :: Acol_d(A_nnz)
  Real(8),device    :: Aval_d(A_nnz)

  Integer,parameter :: B_rows=4
  Integer,parameter :: B_cols=4
  Integer,parameter :: B_nnz=8
  Integer           :: Brow(B_rows+1)
  Integer           :: Bcol(B_nnz)
  Real(8)           :: Bval(B_nnz)
  Integer,device    :: Brow_d(B_rows+1)
  Integer,device    :: Bcol_d(B_nnz)
  Real(8),device    :: Bval_d(B_nnz)

  Integer,allocatable :: Crow(:)
  Integer,allocatable :: Ccol(:)
  Integer,allocatable,device  :: Crow_d(:)
  Integer,allocatable,device  :: Ccol_d(:)
  !!Define Matrix----------------------------------

  Real(8) :: alpha=1d0,beta=0d0

  Integer :: status
  type(cusparseHandle) :: handle
  type(cusparseSpMatDescr) :: matA,matB,matC
  type(cusparseSpGEMMDescr) :: SpGEMMDesc

  Integer(8) :: bufferSize1
!  Integer(1),pointer,device :: buffer1(:)
  Integer(1),device,allocatable :: buffer1(:)

  !!Define Matrix----------------------------------
  Arow=(/1,4,5,8,10/)
  Acol=(/1,3,4,2,1,3,4,2,4/)
  Aval=(/1d0,2d0,3d0,4d0,5d0,6d0,7d0,8d0,9d0/)

  Brow=(/1,3,5,8,9/)
  Bcol=(/1,4,2,4,1,2,3,2/)
  Bval=(/1d0,2d0,3d0,4d0,5d0,6d0,7d0,8d0/)

  status=cudaDeviceSynchronize
  Arow_d=Arow
  Acol_d=Acol
  Aval_d=Aval
  Brow_d=Brow
  Bcol_d=Bcol
  Bval_d=Bval
  status=cudaDeviceSynchronize
  !!Define Matrix----------------------------------

  allocate(Crow_d(A_rows+1))

  ! initalize CUSPARSE and matrix descriptor
  status=cusparseCreate(handle)
if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreate error: ', status

  status=cusparseCreateCsr(matA,A_rows,A_cols,A_nnz, &
                           ARow_d,ACol_d,Aval_d,  &
                           CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                           CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cusparseCreateCsr(matB,B_rows,B_cols,B_nnz, &
                           BRow_d,BCol_d,Bval_d,  &
                           CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                           CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cusparseCreateCsr(matC,A_rows,B_cols,0, &
                           null(),null(),null(),  &
                           CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                           CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cudaDeviceSynchronize

  !!----------------------------------------------------------------------------------------------------

  !!SpGEMM computation
  status=cusparseSpGEMM_createDescr(SpGEMMDesc)
if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_CreateDescr error: ', status

  !! ask bufferSize1 bytes for external memory
  status=cusparseSpGEMM_workEstimation(handle,&
                                       CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
                                       alpha,matA,matB,beta,matC,&
                                       CUDA_R_64F,CUSPARSE_SPGEMM_DEFAULT,&
                                       SpGEMMDesc,bufferSize1,null())
if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_workEstimation error: ', status

  if(allocated(buffer1)) deallocate(buffer1)
  if(bufferSize1 /= 0) allocate(buffer1(bufferSize1))

  !! inspect the A and B to understand the memory requirement for the next stop
  status=cusparseSpGEMM_workEstimation(handle,&
                                       CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
                                       alpha,matA,matB,beta,matC,&
                                       CUDA_R_64F,CUSPARSE_SPGEMM_DEFAULT,&
                                       SpGEMMDesc,bufferSize1,buffer1)
if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_workEstimation error: ', status

  deallocate(buffer1)

  status=cusparseSpGEMM_destroyDescr(SpGEMMDesc)
  status=cusparseDestroySpMat(matA)
  status=cusparseDestroySpMat(matB)
  status=cusparseDestroySpMat(matC)
  status=cusparseDestroy(handle)

return
end program SpGEMM

qanhpham · September 22, 2023, 4:15pm

Hi @H-POTATO . You’re using a quite old version. I’d suggest using a newer or latest version as many issues in old versions were fixed.

H-POTATO · September 23, 2023, 12:13am

Thank you! @qanhpham .
The “version” is CUDA version ?
My code is ok this way?

H-POTATO · September 25, 2023, 8:15am

Hi, @qanhpham !

I tried again with CUDA 12.0.
The result is the same, the status in the first cusparseSpGEMM_workEstimation is still 7, and the buffersize1 becomes a huge value and cannot be allocated, so the program terminates abnormally.

What should I do?
Thanks.

qanhpham · September 25, 2023, 7:13pm

Hi @H-POTATO,

I checked your program using our C code sample and it worked well. As the C code is running, maybe the issue comes from Fortran?
You said the error happened in the first cusparseSpGEMM_workEstimation call, but it shouldn’t be the case. Can you check it again?

H-POTATO · September 26, 2023, 7:29am

Thank you for reply, @qanhpham ,

The reason it didn’t work was that I had set the buffer to device memory instead of a pointer.
And, I solved the problem by inputting buffer1, which is nullified, instead of input null.

With the code shown below, the sample code could be calculated correctly in Fortran !

But, If I try to perform a larger matrix product
((4394x58621) with 58621 nonzero x(58621x4394) with 691216 nonzero)
cusparseSpGEMM_workEstimation error: 11
would result in C_nnz=0.
Is this error “CUSPARSE_STATUS_INSUFFICIENT_RESOURCES”?

After several calculations, it can be calculated without any errors.

And, If I have a larger calculation (500,000 x 500,000 matrix), for example, can I use CUSPARSE_SPGEMM_ALG1?

Should I try CUSPARSE_SPGEMM_ALG2 or CUSPARSE_SPGEMM_ALG3 ?

Thanks.

program SpGEMM

use cudafor
use cusparse

Implicit none

  !!Define Matrix----------------------------------
  Integer,parameter :: A_rows=4
  Integer,parameter :: A_cols=4
  Integer,parameter :: A_nnz=9
  Integer           :: Arow(A_rows+1)
  Integer           :: Acol(A_nnz)
  Real(8)           :: Aval(A_nnz)
  Integer,device    :: Arow_d(A_rows+1)
  Integer,device    :: Acol_d(A_nnz)
  Real(8),device    :: Aval_d(A_nnz)

  Integer,parameter :: B_rows=4
  Integer,parameter :: B_cols=4
  Integer,parameter :: B_nnz=8
  Integer           :: Brow(B_rows+1)
  Integer           :: Bcol(B_nnz)
  Real(8)           :: Bval(B_nnz)
  Integer,device    :: Brow_d(B_rows+1)
  Integer,device    :: Bcol_d(B_nnz)
  Real(8),device    :: Bval_d(B_nnz)

  Integer   :: C_rows
  Integer   :: C_cols
  Integer   :: C_nnz
  Integer,allocatable :: Crow(:)
  Integer,allocatable :: Ccol(:)
  Real(8),allocatable :: Cval(:)
  Integer,allocatable,device  :: Crow_d(:)
  Integer,allocatable,device  :: Ccol_d(:)
  Real(8),allocatable,device  :: Cval_d(:)

  Integer,parameter   :: C_rows_true=4
  Integer,parameter   :: C_cols_true=4
  Integer,parameter   :: C_nnz_true=12
  Integer :: Crow_true(C_rows_true+1)
  Integer :: Ccol_true(C_nnz_true)
  Real(8) :: Cval_true(C_nnz_true)
  !!Define Matrix----------------------------------

  Real(8) :: alpha=1d0,beta=0d0

  Integer(8)   :: C_rows_dbl
  Integer(8)   :: C_cols_dbl
  Integer(8)   :: C_nnz_dbl

  Integer :: istat,status
  type(cusparseHandle) :: handle
  type(cusparseSpMatDescr) :: matA,matB,matC
  type(cusparseSpGEMMDescr) :: SpGEMMDesc

  Integer(8) :: bufferSize1
  Integer(1),pointer,device :: buffer1(:)
  !!Integer(1),device,allocatable :: buffer1(:)

  Integer(8) :: bufferSize2
  Integer(1),pointer,device :: buffer2(:)
  !!Integer(1),device,allocatable :: buffer2(:)

  !!Define Matrix----------------------------------
  Arow=(/1,4,5,8,10/)
  Acol=(/1,3,4,2,1,3,4,2,4/)
  Aval=(/1d0,2d0,3d0,4d0,5d0,6d0,7d0,8d0,9d0/)

  Brow=(/1,3,5,8,9/)
  Bcol=(/1,4,2,4,1,2,3,2/)
  Bval=(/1d0,2d0,3d0,4d0,5d0,6d0,7d0,8d0/)

  istat=cudaDeviceSynchronize
  Arow_d=Arow
  Acol_d=Acol
  Aval_d=Aval
  Brow_d=Brow
  Bcol_d=Bcol
  Bval_d=Bval
  istat=cudaDeviceSynchronize
  !!Define Matrix----------------------------------

  allocate(Crow_d(A_rows+1))


  ! initalize CUSPARSE and matrix descriptor
  status=cusparseCreate(handle)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreate error: ', status

  status=cusparseCreateCsr(matA,A_rows,A_cols,A_nnz, &
                           ARow_d,ACol_d,Aval_d,  &
                           CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                           CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cusparseCreateCsr(matB,B_rows,B_cols,B_nnz, &
                          BRow_d,BCol_d,Bval_d,  &
                          CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                          CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cusparseCreateCsr(matC,A_rows,B_cols,0, &
                           null(),null(),null(),  &
                           CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                           CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cudaDeviceSynchronize

  !!----------------------------------------------------------------------------------------------------

  !!SpGEMM computation
  status=cusparseSpGEMM_createDescr(SpGEMMDesc)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_CreateDescr error: ', status


  !! ask bufferSize1 bytes for external memory
  nullify(buffer1)
  status=cusparseSpGEMM_workEstimation(handle,&
                                      CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
                                      alpha,matA,matB,beta,matC,&
                                      CUDA_R_64F,CUSPARSE_SPGEMM_DEFAULT,&
                                      SpGEMMDesc,bufferSize1,buffer1)
                                      !!SpGEMMDesc,bufferSize1,null())
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_workEstimation error: ', status

  istat=cudaDeviceSynchronize
  print *, "bufferSize1=",bufferSize1

  !if(allocated(buffer1)) deallocate(buffer1)
  if(bufferSize1 /= 0) allocate(buffer1(bufferSize1))

  !! inspect the A and B to understand the memory requirement for the next step
  status=cusparseSpGEMM_workEstimation(handle,&
                                      CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
                                      alpha,matA,matB,beta,matC,&
                                      CUDA_R_64F,CUSPARSE_SPGEMM_DEFAULT,&
                                      SpGEMMDesc,bufferSize1,buffer1)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_workEstimation error: ', status


  !! ask bufferSize2 bytes for external memory
  nullify(buffer2)
  status=cusparseSpGEMM_compute(handle,&
                               CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
                               alpha,matA,matB,beta,matC,&
                               CUDA_R_64F,CUSPARSE_SPGEMM_DEFAULT,&
                               SpGEMMDesc,bufferSize2,buffer2)
                               !!SpGEMMDesc,bufferSize2,null())
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_workEstimation error: ', status

  istat=cudaDeviceSynchronize
  print *, "bufferSize2=",bufferSize2
  print *, istat


  if(allocated(buffer2)) deallocate(buffer2)
  if(bufferSize2 /= 0) allocate(buffer2(bufferSize2))
  !! compute the intermediate product of A * B
  status=cusparseSpGEMM_compute(handle,&
                               CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
                               alpha,matA,matB,beta,matC,&
                               CUDA_R_64F,CUSPARSE_SPGEMM_DEFAULT,&
                               SpGEMMDesc,bufferSize2,buffer2)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_compute error: ', status

  !! get matrix C nnz
  status=cusparseSpMatGetSize(matC,C_rows_dbl,C_cols_dbl,C_nnz_dbl)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpMatGetSize error: ', status

  istat=cudaDeviceSynchronize
  C_rows=C_rows_dbl
  C_cols=C_cols_dbl
  C_nnz=C_nnz_dbl
  istat=cudaDeviceSynchronize

  write(*,*) "A_rows",A_rows,"A_cols",A_cols,"A_nnz",A_nnz
  write(*,*) "B_rows",B_rows,"B_cols",B_cols,"B_nnz",B_nnz
  write(*,*) "C_rows",C_rows,"C_cols",C_cols,"C_nnz",C_nnz


  !! allocate matrix C
  if(allocated(Ccol_d)) deallocate(Ccol_d)
  if(allocated(Cval_d)) deallocate(Cval_d)
  allocate(Ccol_d(C_nnz))
  allocate(Cval_d(C_nnz))
  istat=cudaDeviceSynchronize

  !! update matC with the new pointers
  status=cusparseCsrSetPointers(matC,Crow_d,Ccol_d,Cval_d)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCsrSetPointers error: ', status

  !! copy the final products to the matrix C
  status=cusparseSpGEMM_copy(handle,&
                             CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
                             alpha,matA,matB,beta,matC,&
                             CUDA_R_64F,CUSPARSE_SPGEMM_DEFAULT,SpGEMMDesc)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_copy error: ', status


  deallocate(buffer1)
  deallocate(buffer2)

  status=cusparseSpGEMM_destroyDescr(SpGEMMDesc)
  status=cusparseDestroySpMat(matA)
  status=cusparseDestroySpMat(matB)
  status=cusparseDestroySpMat(matC)
  status=cusparseDestroy(handle)


!======================================================

  istat=cudaDeviceSynchronize
  Crow=Crow_d
  Ccol=Ccol_d
  Cval=Cval_d
  istat=cudaDeviceSynchronize

  print *, Crow
  print *, " "
  print *, Ccol
  print *, " "
  print *, Cval


return
end program SpGEMM

qanhpham · September 26, 2023, 4:29pm

Hi @H-POTATO,

Yes, when the DEFAULT (ALG1) fails, you can switch to ALG2 or ALG3 which can run with larger matrices.

H-POTATO · September 27, 2023, 11:49am

Thanks for your continued replies, @qanhpham !

I followed the sample code (https://github.com/NVIDIA/CUDALibrarySamples/blob/master/cuSPARSE/spgemm_mem/spgemm_mem_example.c) and tried with ALG3, but at the first compilation cusparseSpGEMM_workstation, I get the following error:

NVFORTRAN-S-0155-Could not resolve generic procedure cusparsespgemm_workestimation
NVFORTRAN-S-0038-Symbol, cusparse_spgemm_alg3, has not been explicitly declared

I just changed CUSPARSE_SPGEMM_DEFAULT to CUSPARSE_SPGEMM_ALG3.
There is no compile problem with CUSPARSE_SPGEMM_DEFAULT as it is.
What should I do?
Thanks.

This is my code with ALG3.

subroutine SpGEMM_ALG3

use cudafor
use cusparse

Implicit none

  !!Define Matrix----------------------------------
  Integer,parameter :: A_rows=4
  Integer,parameter :: A_cols=4
  Integer,parameter :: A_nnz=9
  Integer           :: Arow(A_rows+1)
  Integer           :: Acol(A_nnz)
  Real(8)           :: Aval(A_nnz)
  Integer,device    :: Arow_d(A_rows+1)
  Integer,device    :: Acol_d(A_nnz)
  Real(8),device    :: Aval_d(A_nnz)

  Integer,parameter :: B_rows=4
  Integer,parameter :: B_cols=4
  Integer,parameter :: B_nnz=8
  Integer           :: Brow(B_rows+1)
  Integer           :: Bcol(B_nnz)
  Real(8)           :: Bval(B_nnz)
  Integer,device    :: Brow_d(B_rows+1)
  Integer,device    :: Bcol_d(B_nnz)
  Real(8),device    :: Bval_d(B_nnz)

  Integer   :: C_rows
  Integer   :: C_cols
  Integer   :: C_nnz
  Integer,allocatable :: Crow(:)
  Integer,allocatable :: Ccol(:)
  Real(8),allocatable :: Cval(:)
  Integer,allocatable,device  :: Crow_d(:)
  Integer,allocatable,device  :: Ccol_d(:)
  Real(8),allocatable,device  :: Cval_d(:)

  Integer,parameter   :: C_rows_true=4
  Integer,parameter   :: C_cols_true=4
  Integer,parameter   :: C_nnz_true=12
  Integer :: Crow_true(C_rows_true+1)
  Integer :: Ccol_true(C_nnz_true)
  Real(8) :: Cval_true(C_nnz_true)
  !!Define Matrix----------------------------------

  Real(8) :: alpha=1d0,beta=0d0

  Integer(8)   :: C_rows_dbl
  Integer(8)   :: C_cols_dbl
  Integer(8)   :: C_nnz_dbl

  Integer :: istat,status
  type(cusparseHandle) :: handle
  type(cusparseSpMatDescr) :: matA,matB,matC
  type(cusparseSpGEMMDescr) :: SpGEMMDesc
  !type(cusparseSpGEMMALG) :: CUSPARSE_SPGEMM_ALG3
  !type(cusparseSpGEMMALG) :: CUSPARSE_SPGEMM_DEFAULT

  Integer(8) :: bufferSize1
  Integer(1),pointer,device :: buffer1(:)

  Integer(8) :: bufferSize2
  Integer(1),pointer,device :: buffer2(:)

  !! ALG3
  Integer(8) :: bufferSize3
  Integer(1),pointer,device :: buffer3(:)

  Integer(8) :: num_prods
  Real(8) :: chunk_fraction=0.2d0
  !! ALG3

  !!Define Matrix----------------------------------
  Arow=(/1,4,5,8,10/)
  Acol=(/1,3,4,2,1,3,4,2,4/)
  Aval=(/1d0,2d0,3d0,4d0,5d0,6d0,7d0,8d0,9d0/)

  Brow=(/1,3,5,8,9/)
  Bcol=(/1,4,2,4,1,2,3,2/)
  Bval=(/1d0,2d0,3d0,4d0,5d0,6d0,7d0,8d0/)

  istat=cudaDeviceSynchronize
  Arow_d=Arow
  Acol_d=Acol
  Aval_d=Aval
  Brow_d=Brow
  Bcol_d=Bcol
  Bval_d=Bval
  istat=cudaDeviceSynchronize
  !!Define Matrix----------------------------------

  allocate(Crow_d(A_rows+1))


  ! initalize CUSPARSE and matrix descriptor
  status=cusparseCreate(handle)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreate error: ', status

  status=cusparseCreateCsr(matA,A_rows,A_cols,A_nnz, &
                           ARow_d,ACol_d,Aval_d,  &
                           CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                           CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cusparseCreateCsr(matB,B_rows,B_cols,B_nnz, &
                          BRow_d,BCol_d,Bval_d,  &
                          CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                          CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cusparseCreateCsr(matC,A_rows,B_cols,0, &
                           null(),null(),null(),  &
                           CUSPARSE_INDEX_32I,CUSPARSE_INDEX_32I, &
                           CUSPARSE_INDEX_BASE_ONE,CUDA_R_64F)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCreateCsr error: ', status

  status=cudaDeviceSynchronize

  !!----------------------------------------------------------------------------------------------------

  !!SpGEMM computation
  status=cusparseSpGEMM_createDescr(SpGEMMDesc)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_CreateDescr error: ', status


  !! ask bufferSize1 bytes for external memory
  nullify(buffer1)
  status=cusparseSpGEMM_workEstimation(handle,&
                                       CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
                                       alpha,matA,matB,beta,matC,&
                                       CUDA_R_64F,CUSPARSE_SPGEMM_ALG3,&
                                       !!CUDA_R_64F,CUSPARSE_SPGEMM_DEFAULT,&
                                       SpGEMMDesc,bufferSize1,buffer1)
  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_workEstimation error: ', status

  istat=cudaDeviceSynchronize
  print *, "bufferSize1=",bufferSize1

  if(bufferSize1 /= 0) allocate(buffer1(bufferSize1))

!!  !! inspect the A and B to understand the memory requirement for
!!  !! the next step
!!  status=cusparseSpGEMM_workEstimation(handle,&
!!                                       CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
!!                                       alpha,matA,matB,beta,matC,&
!!                                       CUDA_R_64F,CUSPARSE_SPGEMM_ALG3,&
!!                                       SpGEMMDesc,bufferSize1,buffer1)
!!  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_workEstimation error: ', status
!!
!!  !!ALG3--------------------------------------
!!
!!  status=cusparseSpGEMM_getNumProducts(SpGEMMDesc,num_prods)
!!
!!  !! ask bufferSize3 bytes for external memory
!!  nullify(buffer3)
!!  status=cusparseSpGEMM_estimateMemory(handle,&
!!                                       CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
!!                                       alpha,matA,matB,beta,matC,&
!!                                       CUDA_R_64F,CUSPARSE_SPGEMM_ALG3,&
!!                                       SpGEMMDesc,chunk_fraction,&
!!                                       bufferSize3,buffer3,&
!!                                       bufferSize2)
!!  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_workEstimation error: ', status
!!
!!  istat=cudaDeviceSynchronize
!!  print *, "bufferSize2=",bufferSize2
!!  print *, istat
!!
!!  if(bufferSize2 /= 0) allocate(buffer2(bufferSize2))
!!
!!  !! buffer3 can be safely freed to save more memory
!!  deallocate(buffer3)
!!
!!  !!ALG3--------------------------------------
!!
!!
!!
!!
!!  !! compute the intermediate product of A * B
!!  status=cusparseSpGEMM_compute(handle,&
!!                               CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
!!                               alpha,matA,matB,beta,matC,&
!!                               CUDA_R_64F,CUSPARSE_SPGEMM_ALG3,&
!!                               SpGEMMDesc,bufferSize2,buffer2)
!!  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_compute error: ', status
!!
!!
!!  !! get matrix C non-zero entires C_nnz1
!!  status=cusparseSpMatGetSize(matC,C_rows_dbl,C_cols_dbl,C_nnz_dbl)
!!  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpMatGetSize error: ', status
!!
!!  istat=cudaDeviceSynchronize
!!  C_rows=C_rows_dbl
!!  C_cols=C_cols_dbl
!!  C_nnz=C_nnz_dbl
!!  istat=cudaDeviceSynchronize
!!
!!  write(*,*) "A_rows",A_rows,"A_cols",A_cols,"A_nnz",A_nnz
!!  write(*,*) "B_rows",B_rows,"B_cols",B_cols,"B_nnz",B_nnz
!!  write(*,*) "C_rows",C_rows,"C_cols",C_cols,"C_nnz",C_nnz
!!
!!
!!  !! allocate matrix C
!!  if(allocated(Ccol_d)) deallocate(Ccol_d)
!!  if(allocated(Cval_d)) deallocate(Cval_d)
!!  allocate(Ccol_d(C_nnz))
!!  allocate(Cval_d(C_nnz))
!!  istat=cudaDeviceSynchronize
!!
!!  !! update matC with the new pointers
!!  status=cusparseCsrSetPointers(matC,Crow_d,Ccol_d,Cval_d)
!!  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseCsrSetPointers error: ', status
!!
!!  !! copy the final products to the matrix C
!!  status=cusparseSpGEMM_copy(handle,&
!!                             CUSPARSE_OPERATION_NON_TRANSPOSE,CUSPARSE_OPERATION_NON_TRANSPOSE,&
!!                             alpha,matA,matB,beta,matC,&
!!                             CUDA_R_64F,CUSPARSE_SPGEMM_ALG3,SpGEMMDesc)
!!  if(status/=CUSPARSE_STATUS_SUCCESS) print *, 'cusparseSpGEMM_copy error: ', status
!!
!!
!!  deallocate(buffer1)
!!  deallocate(buffer2)
!!
!!  status=cusparseSpGEMM_destroyDescr(SpGEMMDesc)
!!  status=cusparseDestroySpMat(matA)
!!  status=cusparseDestroySpMat(matB)
!!  status=cusparseDestroySpMat(matC)
!!  status=cusparseDestroy(handle)
!!
!!
!!!======================================================
!!
!!  istat=cudaDeviceSynchronize
!!  Crow=Crow_d
!!  Ccol=Ccol_d
!!  Cval=Cval_d
!!  istat=cudaDeviceSynchronize
!!
!!  print *, Crow
!!  print *, " "
!!  print *, Ccol
!!  print *, " "
!!  print *, Cval


return
end subroutine SpGEMM_ALG3

qanhpham · September 27, 2023, 5:10pm

Are you compiling using CUDA 12.0+? ALG2 and ALG3 are only available since CUDA 12.0.

H-POTATO · September 28, 2023, 10:05am

Yes. I use CUDA 12.0.
I cannot compile ALG1 as well as ALG2 and ALG3.
Only CUSPARSE_SPGEMM_DEFAULT can be compiled.

qanhpham · September 28, 2023, 4:47pm

Looks like you’re still using CUDA 11.x or its header file cusparse.h while compiling. Can you check if all the paths compilation parameters are correct?

H-POTATO · October 19, 2023, 8:59am

Hi! @qanhpham
Sorry for reply too late because of my reason.

I have re-installed CUDA 12.2 on my GPU machine and compiled it again, but I still get the same error.

I also contacted the Information Technology Center at the university, but they told me that they had tried compiling with the CUDA Fortran program but were unable to do so, and that they would contact NVIDIA.

Thanks.

qanhpham · October 19, 2023, 5:04pm

Hi @H-POTATO.
If it can’t find the new symbol ALG3 it must be using the old toolkit (< 12.0). Can you check your compile command to see if it’s pointing the right CUDA version?

Robert_Crovella · October 19, 2023, 6:00pm

CUDA Fortran is most commonly available via the installation of the HPC SDK. The HPC SDK generally uses its own installation of CUDA libraries. Simply installing CUDA 12.2 “somewhere else” will not cause CUDA Fortran to make use of it.

If you want access to the latest version of the CUDA libraries (such as cusparse) via CUDA Fortran, the best thing to do is probably to do a proper install of the latest version of the HPC SDK.

H-POTATO · October 24, 2023, 5:58am

Hi, @Robert_Crovella .

Of course, I downloaded the HPC SDK 23.9 along with CUDA 12.2.
I checked the cusparse source code and found that “cusparse_SPGEMM_estimeteMemory” and “cusparse_SPGEMM_getnumproducts” used in SPGEMM_ALG3 are in cusparse.h, while they are not in cusparse.f90.
I suspect this makes it impossible to compile with CUDA Fortran.

The location of each code is as follows in my environment.
I would appreciate your reference.

Thanks.

/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/targets/x86_64-linux/include/cusparse.h

/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/src/cusparse.f90

Robert_Crovella · October 24, 2023, 4:22pm

That seems reasonable/plausible and I have not checked that.

In my opinion, this question should be addressed on the HPC compilers forum. Most CUDA Fortran questions can be found there. If you wish to post a question there, you can link easily to the one here, or I can move this question for you.

I would rephrase the question if posting there to focus on this specific observation you have made, about the lack of prototype in the Fortran module.

H-POTATO · October 25, 2023, 5:26am

Hi, @Robert_Crovella !
Thanks for your advice.
I posted this question with this link in the HPC compilers forum.
I will wait for the response there.

Thanks.

Topic		Replies	Views
CUBLAS problem CUDA Programming and Performance	32	19289	March 28, 2012
Where can I find working examples for the new cuBLASLt library? GPU-Accelerated Libraries	35	5884	March 16, 2020
cuSPARSE generic procedure could not be resolved NVFORTRAN-S-0155 nvc, nvc++ and nvfortran cuda	9	812	November 22, 2021
cuSOLVER sparse: cusolverSpDcsrlsvqr() error GPU-Accelerated Libraries	4	3811	June 7, 2015
Sparse matrix manipulation CUDA Programming and Performance	18	1171	January 11, 2024
Using cuSolverDN in FORTRAN code GPU-Accelerated Libraries	8	4255	November 18, 2015
Upgrading to CUDA 12.4 broke down the application GPU-Accelerated Libraries cublas , cusparse	13	1135	July 21, 2024
Some questions for the function "cusparseDcsrsv2_solve" nvc, nvc++ and nvfortran cuda	12	56	August 9, 2024
CuSolver Sparse on Fortran GPU-Accelerated Libraries cusolver	6	1442	April 28, 2024
Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores Technical Blog	21	2690	December 29, 2022

Problems in the use of cusparseSpGEMM in CUDA Fortran

Related topics