syncthread() will only sync threads in the same block, If I want to create a check points for all threads on the device, should I use threadfense() ? for the following sample code:
attribute(global) subroutine MyTest_cuda integer :: s, i, j do s=1, 10 do i=do w=blockidx%x, 20, griddim%x do j=threadidx%x, 20, blockdim%x ...... enddo ! j enddo ! i call threadfense() enddo !s end subroutine
If I call the subroutine with
Since the i, j loops are only up to 20. some threads will not go into the loop. Are they going to wait at ‘call threadfense()’? thanks for the help, Mat.