syncthread() will only sync threads in the same block, If I want to create a check points for all threads on the device, should I use threadfense() ? for the following sample code:
attribute(global) subroutine MyTest_cuda
integer :: s, i, j
do s=1, 10
do i=do w=blockidx%x, 20, griddim%x
do j=threadidx%x, 20, blockdim%x
......
enddo ! j
enddo ! i
call threadfense()
enddo !s
end subroutine
If I call the subroutine with
call MyTest_cuda<<<30,30>>>
Since the i, j loops are only up to 20. some threads will not go into the loop. Are they going to wait at ‘call threadfense()’? thanks for the help, Mat.