I am writing a program in cuda fortran,and i think i need some global synchronization among all threads in the GPU. Because in my program, firstly each thread will compute some intermediate data, for example, thread i produces A on the global device memory, then thread i will need its neighbour’s results(say A[i-1] and A[i+1]) to compute B. Since this data dependence exists in all the threads and not a single thread block, i think i must make thread i “waits” thread i-1 and thread i+1 which can be in different thread blocks. The subroutine synchthread() can only take effect in one thread block, then i need something like global synchronization. Dose anyone know threadfence() or other alternative methods can help me???
thanks a lot
global synchronization among all threads in the GPU
CUDA provides no guaranteed method for global synchronisation except between kernel launches. You will need to rewrite your code to remove the dependency or break your code into multiple kernel launches.