I have several host arrays declared like the following :
real,allocatable :: a(-x1:x1,-y1:y2,-z1:z2)
Here is some dummy host computation :
do k=-z1,z2
do j=-y1,y2
do i=-x1,x2
a(i,j,k) = foo * a(i,j,k) + bar
end do
end do
end do
What would be the most straightforward way (not asking for the code) to write this kernel in cuda? Should I declare a 1D device array dev_a and copy the 3D host array a to dev_a ?
What would be the most straightforward way (not asking for the code) to write this kernel in cuda? Should I declare a 1D device array dev_a and copy the 3D host array a to dev_a ?
You would want the device array to have the same shape and size as the host array. Then you can write “dev_a=a” to copy “a” to the device, or “a=dev_a” to copy the results back to the host.