Problem with performance when gridDim.x > 65536

kf4ayt · February 2, 2021, 11:31pm

Greetings all,

I am currently working on a blocked version of the Floyd-Warshall algorithm using the implementation that CudaaduC posted here as the template and have run into a rather serious performance issue. Namely, when I pass 65,636 blocks on the grid for a certain kernel, the runtime skyrockets (as in, sub-2ms to over-10ms).

I’m running CUDA 10.1 on a GTX 1070 Ti. Each block/tile (dim3(32, 32, 1)) actually represents a 32x32 section of the graph matrix and shared memory is utilized to speed up processing. I have no problems when I am processing blocks on a given column, but when I pass more than 65,536 blocks on the grid, which is dim3(num_tiles, 1, 1), the runtime for the next kernel, which does the same thing, only moving across a given row, skyrockets. For V=8100, the runtime is fine. For V=8192 (65536 blocks for 32x32 blocks), the runtime is fine. For V=8200, the runtime skyrockets, but the program continues to run - it does not crash.

I can post the relevant code, but since it will take me some time to prepare it (it’s not fully commented), I wanted to go on and ask w/o it, since I’m not sure if the problem doesn’t simply lie in my grid/block configuration (I’ve never used a 2D shared memory array in a 1D grid).

I appreciate any and all help!

Charles Johnson

PS The graph matrix is in a 1D array and I simply access the data using calculated offsets.

njuffa · February 3, 2021, 1:16am

(1) Make sure you are compiling your code for the correct target architecture(s).
(2) Use cuda-memcheck to ensure sure there are no obvious issues with your code. Fix all issues cuda-memcheck reports.

Topic		Replies	Views
K10 has a problem with "large" gridDim.x CUDA Programming and Performance	4	1937	July 30, 2013
Grid dimensions CUDA Programming and Performance	6	5799	September 18, 2009
Problems with maximum grid dimension CUDA Programming and Performance	2	734	October 16, 2018
2-D Memory Allocation Issues CUDA Programming and Performance	4	4898	July 15, 2009
time problems with big grid CUDA Programming and Performance	12	1030	September 14, 2017
No result after increasing the size of Grid or Block CUDA Programming and Performance	2	473	August 1, 2016
problem with bigger than 32768-size grids CUDA bug? CUDA Programming and Performance	9	6701	January 28, 2009
Big grid size crash on GTX480 CUDA Programming and Performance	5	3452	November 4, 2011
Probably a simple answer Simple CUDA code - unexpected result CUDA Programming and Performance	7	4971	October 27, 2010
Can not use more than 16*256 threads! CUDA Programming and Performance	7	2572	August 4, 2008

Problem with performance when gridDim.x > 65536

Related topics