I tried to implement the GPU synchronization method introduced by " On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit " (http://synergy.cs.vt.edu/pubs/papers/xiao-icpads2009-gpu.pdf). The method is very similar with the code sample on p. 111 of Programming Guide Version 2.3.1.
I employed the synchronization function given by Figure 7 of the paper in my kernel, and it worked correctly when the dimension of the matrix was smaller than 256256. However, when the dimension of the matrix is 256256 or greater, the program seems to never stop. So I wrote a very easy function to test the synchronization function, in which 1 is added to each element of the matrix in each iteration. It couldn’t work either, when the dimension of the matrix is increased.
I have attached my code and hope someone could help. Thanks a lot.
BTW: GPU1.cu contains the main function. Device_MatrixUtilities.cu includes the synchronization function device void __GPU_sync(int goalVal), the kernel global void Test(float *U) and other related functions. The file header.h contains the definition of the dimension of the matrix, block and grid.
header.h (413 Bytes)
Device_MatrixUtilities.cu (1.73 KB)
GPU1.cu (1.86 KB)