Hi Guys, I have the issue about the data swap according to the given squence (
id). The code is followed:
attributes(global) subroutine d_APPLYLINEARID_PTC(id,x,y,u,v,w,N) implicit none integer(kind=4),value :: N integer(kind=4),dimension (N,2) :: id real(kind=4),dimension (N,2) :: x,y,u,v,w integer(kind=4) :: i,j,r_id real(kind=4) :: r_x,r_y,r_u,r_v,r_w i = (blockIdx%x - 1)*blockDim%x + threadIdx%x j = blockIdx%y if(i <= N .and. j <= 2)then r_id = id(i,j) r_x = x(r_id,j); r_y = y(r_id,j) r_u = u(r_id,j); r_v = v(r_id,j); r_w = w(r_id,j) end if call threadfence_system() if(i <= r_nptl .and. j <= 2)then x(i,j) = r_x; y(i,j) = r_y u(i,j) = r_u; v(i,j) = r_v; w(i,j) = r_w end if return end subroutine d_APPLYLINEARID_PTC
(Where thread size is (128, 1, 1) and block size is (ceiling(N/128), 2, 1), N is larger than millions). When I verify the results using this kernel, I find that the swap process is not completely finish (most of the result is correct, still have certain data is not correct). I do use the
threadfence_system to wait all the data is loaded to the register, but it is still not correct. What the crucial point that I missed?