Wrong result sparse matrix-vector multiplication

Dear all,

I have implemented the function cusparseDcsrmv in Cuda Fortran. However, the result is not correct. The vector rho has only one element that is unequal to zero (rho(1)=1), hence it is easy to check the result. The compiler version is 17.10 and the cuda version is 7.5. I have attached the code I am using. The basis of the code can in principle be extracted from the examples of cuda. Things I have tried to avoid the problem: checked IA, JA and VAL for the correct definition, using a different compiler version (16.5&16.6) and cuda version (8.0).
I have tested the code for 44 and 55 matrices and it works. The integer status is always zero. What I have also checked was the intel mkl routine and it produces the right result. The code is compiled with pgf90 -Mcudalib=cusparse Euler.cuf -o Euler.out . Unfortunately I am not allowed to post the files fort.105 in which the vectors are stored.

It would be nice, if someone could help me with this problem. What could produce this error? What else could I check?

program timepropagation
 use cusparse

implicit none
double precision, allocatable :: rho(:), VAL(:)
double precision :: tres, alpha, rateprop
integer :: k1, k2, Nsparse, DimensionL, status
integer, allocatable :: JA(:), IA(:)

type(cusparseHandle) :: h
type(cusparseMatDescr) :: descrA
double precision, allocatable, device :: drho(:), dVAL(:)
integer, allocatable, device :: dIA(:), dJA(:)

!initalize CUSPARSE and matrix descriptor
 status = cusparseCreate(h)
 status = cusparseCreateMatDescr(descrA)
 status = cusparseSetMatType(descrA,CUSPARSE_MATRIX_TYPE_GENERAL)
 status = cusparseSetMatIndexBase(descrA,CUSPARSE_INDEX_BASE_ONE)
 status = cusparseSetPointerMode(h,CUSPARSE_POINTER_MODE_HOST)

rateprop = 1.0d-19/6.58211928d-16
alpha    = 0.0d0
Nsparse = 593178
DimensionL = 70128

 rho = 0.0d0
 rho(1) = 1.0d0


do k1=1,Nsparse
read(105,'(1E26.19)') VAL(k1)
read(105,'(1I20)') JA(k1)
end do

do k1=1,DimensionL+1
read(105,'(1I20)') IA(k1)
end do

dIA  = IA
dJA  = JA
drho = rho

status = cusparseDcsrmv(h,CUSPARSE_OPERATION_NON_TRANSPOSE,DimensionL,DimensionL,Nsparse,rateprop,descrA,dVAL,dIA,dJA,drho,alpha,drho)

rho = drho

do k1=1,DimensionL
if (rho(k1).ne.0.0d0) write(107,'(1E26.19)') rho(k1)
end do
end program timepropagation

Thank you in advance

It’s odd that it works for smaller sizes, but fails for larger ones.
What happens if you do not use the same array for x and y? (drho in your code below).


thanks a lot for your answer. I appreciate it a lot. It compiles but I cannot execute the program. What should the changed dimensiom adress? Or is it just to try the subroutine?


There is one additional thing that bothers me: the result differs from run to run. This clearly indicates that something goes wrong. Does this problem show what is wrong with my code?

Did you make the change that Brent suggested? i.e. use two different arrays instead of drho for both x and y.

Cusparsecsrmv solves y = alpha Ax + beta y, and you are using drho for both “x� and “y�. So for large problems you will be using the updated value of y for x and obtain wrong answers. The reason it works for 4x4 and 5x5 matrices it that the whole problem fits into a single block so you don’t have to worry about x being overwritten.

I am a bit confused about your follow-up posts. You state that “It compiles but I cannot execute the program” but then also state that “result differs from run to run”. How do you get results, correct or not, if you can’t execute the program?


thank you for your answer. First of all, that was the problem. Now the code produces the right results. Sorry, I missunderstood what I should do, I changed the size of the array. That was the reason why I asked about the backround and why it did not compile, obviousily. When I further increase the array size, the results were more or less random. However, now it works. Thank you for your help!