Wrong result sparse matrix-vector multiplication

sommerfeld · August 24, 2018, 2:19pm

Dear all,

I have implemented the function cusparseDcsrmv in Cuda Fortran. However, the result is not correct. The vector rho has only one element that is unequal to zero (rho(1)=1), hence it is easy to check the result. The compiler version is 17.10 and the cuda version is 7.5. I have attached the code I am using. The basis of the code can in principle be extracted from the examples of cuda. Things I have tried to avoid the problem: checked IA, JA and VAL for the correct definition, using a different compiler version (16.5&16.6) and cuda version (8.0).
I have tested the code for 44 and 55 matrices and it works. The integer status is always zero. What I have also checked was the intel mkl routine and it produces the right result. The code is compiled with pgf90 -Mcudalib=cusparse Euler.cuf -o Euler.out . Unfortunately I am not allowed to post the files fort.105 in which the vectors are stored.

It would be nice, if someone could help me with this problem. What could produce this error? What else could I check?

program timepropagation
 use cusparse

implicit none
double precision, allocatable :: rho(:), VAL(:)
double precision :: tres, alpha, rateprop
integer :: k1, k2, Nsparse, DimensionL, status
integer, allocatable :: JA(:), IA(:)

!Cuda
type(cusparseHandle) :: h
type(cusparseMatDescr) :: descrA
double precision, allocatable, device :: drho(:), dVAL(:)
integer, allocatable, device :: dIA(:), dJA(:)

!initalize CUSPARSE and matrix descriptor
 status = cusparseCreate(h)
 status = cusparseCreateMatDescr(descrA)
 status = cusparseSetMatType(descrA,CUSPARSE_MATRIX_TYPE_GENERAL)
 status = cusparseSetMatIndexBase(descrA,CUSPARSE_INDEX_BASE_ONE)
 status = cusparseSetPointerMode(h,CUSPARSE_POINTER_MODE_HOST)

rateprop = 1.0d-19/6.58211928d-16
alpha    = 0.0d0
Nsparse = 593178
DimensionL = 70128

allocate(rho(DimensionL),drho(DimensionL))
 rho = 0.0d0
 rho(1) = 1.0d0

allocate(VAL(Nsparse),IA(DimensionL+1),JA(Nsparse),dVAL(Nsparse),dIA(DimensionL+1),dJA(Nsparse))

do k1=1,Nsparse
read(105,'(1E26.19)') VAL(k1)
read(105,'(1I20)') JA(k1)
end do

do k1=1,DimensionL+1
read(105,'(1I20)') IA(k1)
end do

dVAL = VAL
dIA  = IA
dJA  = JA
drho = rho

status = cusparseDcsrmv(h,CUSPARSE_OPERATION_NON_TRANSPOSE,DimensionL,DimensionL,Nsparse,rateprop,descrA,dVAL,dIA,dJA,drho,alpha,drho)

rho = drho

do k1=1,DimensionL
if (rho(k1).ne.0.0d0) write(107,'(1E26.19)') rho(k1)
end do
end program timepropagation

Thank you in advance

brentl · August 28, 2018, 3:00pm

It’s odd that it works for smaller sizes, but fails for larger ones.
What happens if you do not use the same array for x and y? (drho in your code below).

sommerfeld · August 29, 2018, 12:12am

Hey,

thanks a lot for your answer. I appreciate it a lot. It compiles but I cannot execute the program. What should the changed dimensiom adress? Or is it just to try the subroutine?

Best

sommerfeld · August 29, 2018, 12:19am

There is one additional thing that bothers me: the result differs from run to run. This clearly indicates that something goes wrong. Does this problem show what is wrong with my code?
Best

MatColgrove · August 29, 2018, 6:30pm

Did you make the change that Brent suggested? i.e. use two different arrays instead of drho for both x and y.

Cusparsecsrmv solves y = alpha Ax + beta y, and you are using drho for both â€œxâ€� and â€œyâ€�. So for large problems you will be using the updated value of y for x and obtain wrong answers. The reason it works for 4x4 and 5x5 matrices it that the whole problem fits into a single block so you donâ€™t have to worry about x being overwritten.

I am a bit confused about your follow-up posts. You state that “It compiles but I cannot execute the program” but then also state that “result differs from run to run”. How do you get results, correct or not, if you can’t execute the program?

-Mat

sommerfeld · August 29, 2018, 11:20pm

Hey,
thank you for your answer. First of all, that was the problem. Now the code produces the right results. Sorry, I missunderstood what I should do, I changed the size of the array. That was the reason why I asked about the backround and why it did not compile, obviousily. When I further increase the array size, the results were more or less random. However, now it works. Thank you for your help!

Topic		Replies	Views
Fortran CUSPARSE Bindings CUDA Programming and Performance	1	1975	May 22, 2012
Keep getting wrong results from cusparseScsrmm () CUDA Programming and Performance	0	543	May 18, 2017
cuSPARSE Library with OpenACC data Directives: cusparseDnVecGetValues not resolvable Legacy PGI Compilers cuda	6	812	October 11, 2021
Sparse LU decomposition got wrong results for large matrices CUDA Programming and Performance	2	631	July 20, 2018
Incorrect result from cusparseSpMM() with COO sparse matrices GPU-Accelerated Libraries	4	834	February 13, 2020
cuSPARSE generic procedure could not be resolved NVFORTRAN-S-0155 nvc, nvc++ and nvfortran cuda	9	812	November 22, 2021
Unable to compile and use the cuSPARSE Dgtsv function in CUDA FORTRAN Legacy PGI Compilers	6	2166	July 6, 2019
CUSPARSE_STATUS_INVALID_VALUE when using cusparseSpMM GPU-Accelerated Libraries	3	2293	July 14, 2019
cusparseScsrilu02 breaks with large matrices GPU-Accelerated Libraries cublas , cusolver , cusparse	10	1150	June 1, 2022
Problem using cusparseScsric0 function GPU-Accelerated Libraries	22	6908	November 29, 2012

Wrong result sparse matrix-vector multiplication

Related topics