I’m really new to cuda-fortran, and I’m trying to parallelize the following subfunction:
subroutine function_tst(Y,X,A,Ilist,n1,n2,n3) implicit none integer,intent(in) :: n1,n2,n3 integer,intent(in) :: Ilist(n1,n2) integer :: i real,intent(in) :: X(n3), A(n1,n1,n2) real,intent(out) :: Y(n3) DO i=1,n2 Y(Ilist(:,i)) = Y(Ilist(:,i)) + MATMUL( A(:,:,i) , X(Ilist(:,i)) ) END DO end subroutine function_tst
where Ilist is like a random list of numbers between 1 and n3.
The issue by trying to parallelize this is that each loop can modify the same elements from Y, which therefore blocks could overwrite each other results.
So I’m asking for some help, please.
You can find the full program in attachment, TestCuda.zip (28.1 KB)