Maximum Grid and Block size

I’m another beginner who is trying to find the maximum data set a Compute 1.3 card can handle. Increasing the data set until I get a kernel failure is what I am currently doing using 1D Grids and topping out at about 30 million data points. I’d prefer to be more scientific about it and also move to 2D Grids.

As I understand it there are 1024 Threads/MP on a Compute 1.3 card, and 30 multiproc. I also understand a high block dimension leads to more efficient processing. The maximum Block dimension is limited to dimBlock(512, 1, 1);

The barrier I am running into is in setting the dimGrid parameter. Currently I am using:
dim3 dimGrid(DataPoints/dimBlock.x,1, 1);

Consequently DataPoints are limited to 65535*512, or less than 5792 x 5792.

DataPoints = Datax*Datax;

This is what I currently use
A: dim3 dimBlock(512, 1, 1);
A: dim3 dimGrid(DataPoints/dimBlock.x, 1, 1);

This is OK, but reduces the number of Blocks
B: dim3 dimBlock(16, 16, 1);
B: dim3 dimGrid(Datax/dimBlock.x, Datax/dimBlock.y, 1);

This doesn’t work.
C: dim3 dimBlock(16, 16, 2);
C: dim3 dimGrid(Datax/dimBlock.x/dimBlock.z, Datax/dimBlock.y/dimBlock.z, 1);

How do I utilise a full 512 Blocks but expand the data set to 2D?


I am beginner too but maybe NVIDIA CUDA Programming Guide Appendix A.1.1…A.1.3 are answers to your question.