Hi,
I’m writing a program that need to process a very large array of data that i need to split into a grid and blocks. The size of the array is variable and I would like to find a way to adapt the size of grid and blocks as well as possible.
However, the information about the constraints and requirements to design an optimal “execution configuration”, i.e. sizing the blocks and grid, are quite scattered in the CUDA manual (v2.1). So I’ve been trying to put together this information below.
Those conditions should allow us to automatically resize a grid and blocks with a good usage of threads while avoiding runtime “invalid execution configuration” errors.
The variable “deviceProp” contains the device information from calling the function cudaGetDeviceProperties().
Block constraints:
- Max nb of threads total :
dimBlock.x * dimBlock.y * dimBlock.z <= deviceProp.maxThreadsPerBlock
- Max nb of threads per dim:
dimBlock.x <= deviceProp.maxThreadsDim[0]
dimBlock.y <= deviceProp.maxThreadsDim[1]
dimBlock.z <= deviceProp.maxThreadsDim[2]
- Recommended:
dimBlock.x = kx * half-warp (typ. 16), where kx,ky,kz are integers
dimBlock.y = ky * half-warp (typ. 16)
dimBlock.z = kz * half-warp (typ. 16)
Grid Constraints:
- Max nb blocks per dim:
dimGrid.x <= deviceProp.maxGridSize[0]
dimGrid.y <= deviceProp.maxGridSize[1]
dimGrid.z <= deviceProp.maxGridSize[2]
- Recommended:
dimGrid.x * dimGrid.y * dimGrid.z >= 2 * deviceProp.multiProcessorCount
I would appreciate if anyone had any addition and/or correction to bring to that summary.
Besides, I have a question:
It seems that grid is limited PER dimension by not by total nb of blocks. So I wanted to used a grid of maximum size for my device i.e. 65535x65535
However, the following execution configuration, crashed my PC even though the kernel test_krnl() had an empty body and no parameter.
dim3 dimBlock(16,16);
dim3 dimGrid(65535,65535);
test_krnl<<<dimGrid,dimBlock>>>();
There wasn’t any error message (such as invalid execution configuration) but I was wondering whether it crashed because there grid being too big, the GPU didn’t respond back to Windows XP quickly enough and crashed. Any idea?
Thank you in advance,
Julien