I have several host arrays declared like the following :
real,allocatable :: a(-x1:x1,-y1:y2,-z1:z2)
Here is some dummy host computation :
do k=-z1,z2 do j=-y1,y2 do i=-x1,x2 a(i,j,k) = foo * a(i,j,k) + bar end do end do end do
What would be the most straightforward way (not asking for the code) to write this kernel in cuda? Should I declare a 1D device array dev_a and copy the 3D host array a to dev_a ?