traversing array

I know that Fortran store array in column-major while C store array in row-major. So, with the current implementation of CUDA Fortran, as I may call C CUDA to do the computation, I’m not sure if between the two following code (one access row-based, and one use column-based), basically, which one should work faster with CUDA Fortran (given that M, N big enough)

real, device, dimension(M, N) :: Ab

attributes(global) subroutine dosomething()

 do i =1, M
  do j = 1,N
   A(i,j) = A(i,j) * (i+j)
 endo
enddo
end subroutine



real, device, dimension(M, N) :: Ab

attributes(global) subroutine dosomething()
   do j =1, N
      do i = 1,M
        A(i,j)=A(i,j) * (i+j)
     endo
   enddo
end subroutine

Thanks,
Tuan.

Hi Tuan,

Sorry, I missed this post earlier.

I would keep it column major here.

  • Mat