Problems with maximum grid dimension

jghimcm · October 16, 2018, 9:19pm

Hello,

I am a rookie in programming using CUDA. I encountered some problems that seemed weird to me when I used different grid dimensions to run my kernel. Would you please explain to me why at certain conditions the kernel does not work properly? Thank you very much in advance!

System: Win7 64bit
Environment: Visual Studio 2013
GPU: GeForce GTX 970M 6GB (Major.Minor: 5.2)
CUDA toolkit version: 8.0

Below is a concise version of my code:

global void init(int* d_Ld, int* d_Rd, int Nt)
{
int blockId = blockIdx.x + blockIdx.ygridDim.x;
int id = blockId(blockDim.xblockDim.y) + threadIdx.yblockDim.x + threadIdx.x;
int stride = blockDim.x * blockDim.y * gridDim.x * gridDim.y;
while (id < Nt)
{
d_Ld[id] = id;
d_Rd[id] = id;
id += stride;
}
}

int Nt=10241024316;
int dim1_grid dim2_grid dim3_grid; (to be determined)
int dim1_block dim2_block dim3_block; (to be determined)
dim3 griddim(dim1_grid, dim2_grid, dim3_grid);
dim3 blockdim(dim1_block, dim2_block, dim3_block);

int* d_Ld, d_Rd;
int* h_probe;

if (cudaMalloc((void**)&Ld, sizeof(int) * Nt)!= cudaSuccess) cout << “ERROR!” << endl;
if (cudaMalloc((void**)&Rd, sizeof(int) * Nt)!=cudaSuccess) cout<<“ERROR!”<<endl;
h_probe = (int*)calloc(Nt, sizeof(int));

init<<<griddim, blockdim>>>(d_Ld, d_Rd, Nt);
printf(“GPU kernel error: %s\n”, cudaGetErrorString(cudaPeekAtLastError()));

cudaMemcpy(h_probe, d_Ld, sizeof(int) * Nt, cudaMemcpyDeviceToHost);
cout<<h_probe[1]<<endl;
cout<<h_probe[10241024316-1]<<endl;

Test Results at different conditions:

dim1_grid=65535, dim2_grid=1, dim3_grid=1, dim1_block=1024, dim2_block=1, dim3_block=1;
Output:
GPU kernel error: no error
1
331350015
This is the correct result.
dim1_grid=65536, dim2_grid=1, dim3_grid=1, dim1_block=1024, dim2_block=1, dim3_block=1;
Output:
GPU kernel error: invalid argument
0
0
This is the wrong result.

I know this is probably related to the maximum dimensions of the grid. However I checked using cudaDeviceProp and the maxGridSize[0] is 2147483647, while maxGridSize[1] and [2] are both 65535. Why does this happen?

Robert_Crovella · October 16, 2018, 9:31pm

maybe you are not compiling for an architecture that matches your GPU.

CUDA 8.0 has a default of cc2.0 which will limit the first grid dimension to 65535, even though your GPU appears to support a grid dimension that is much larger. There are literally dozens of postings like this on websites such as stackoverflow.com

jghimcm · October 16, 2018, 9:49pm

Thank you so much for your explanation! That really helps!

Topic		Replies	Views
Maximum grid dimension Change from 1.0 to 2.0? CUDA Programming and Performance	3	4364	December 3, 2008
Grid dimensions CUDA Programming and Performance	6	5794	September 18, 2009
problem with bigger than 32768-size grids CUDA bug? CUDA Programming and Performance	9	6699	January 28, 2009
Is this Correct? CUDA Programming and Performance	5	3140	May 21, 2009
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10361	June 19, 2011
Grid Dimensions not consistent in API CUDA Programming and Performance	1	1504	November 24, 2010
Invalid argument error with grid x dimension > 65535 CUDA Programming and Performance	2	1356	January 14, 2016
CUDA 5.0 - cudaGetDeviceProperties returns very strange grid size CUDA Programming and Performance	10	4931	June 6, 2013
hitting the grid size limitation CUDA Programming and Performance	5	1582	November 13, 2009
Probably a simple answer Simple CUDA code - unexpected result CUDA Programming and Performance	7	4970	October 27, 2010

Problems with maximum grid dimension

Related topics