I tried to implement the GPU synchronization method introduced by " On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit " (http://synergy.cs.vt.edu/pubs/papers/xiao-icpads2009-gpu.pdf). The method is very similar with the code sample on p. 111 of Programming Guide Version 2.3.1.

I employed the synchronization function given by Figure 7 of the paper in my kernel, and it worked correctly when the dimension of the matrix was smaller than 256*256. However, when the dimension of the matrix is 256*256 or greater, the program seems to never stop. So I wrote a very easy function to test the synchronization function, in which 1 is added to each element of the matrix in each iteration. It couldn’t work either, when the dimension of the matrix is increased.

I have attached my code and hope someone could help. Thanks a lot.

BTW: GPU1.cu contains the main function. Device_MatrixUtilities.cu includes the synchronization function **device** void __GPU_sync(int goalVal), the kernel **global** void Test(float *U) and other related functions. The file header.h contains the definition of the dimension of the matrix, block and grid.

header.h (413 Bytes)

Device_MatrixUtilities.cu (1.73 KB)

GPU1.cu (1.86 KB)