Great feedback on this thread! Can I stretch it a bit further please?
I am another beginner who is trying to find the maximum data set a Compute 1.3 card can handle. Increasing the data set until I get a kernel failure is what I am currently doing using 1D Grids and topping out at about 30 million data points. I’d prefer to be more scientific about it and also move to 2D Grids.
As I understand it there are 1024 Threads/MP on a Compute 1.3 card, and 30 multiproc. I also understand a high block dimension leads to more efficient processing. The maximum Block dimension is limited to dimBlock(512, 1, 1);
The barrier I am running into is in setting the dimGrid parameter. Currently I am using:
dim3 dimGrid(DataPoints/dimBlock.x,1, 1);
Consequently DataPoints are limited to 65535*512, or less than 5792 x 5792 (really 4096 as it is a power of 2).
DataPoints = Datax*Datax;
This is what I currently use
A: dim3 dimBlock(512, 1, 1);
A: dim3 dimGrid(DataPoints/dimBlock.x, 1, 1);
This is OK, but reduces the number of Blocks
B: dim3 dimBlock(16, 16, 1);
B: dim3 dimGrid(Datax/dimBlock.x, Datax/dimBlock.y, 1);
This doesn’t work.
C: dim3 dimBlock(16, 16, 2);
C: dim3 dimGrid(Datax/dimBlock.x/dimBlock.z, Datax/dimBlock.y/dimBlock.z, 1);
How do I utilise a full 512 Blocks but expand the data set to 2D?
[Edit to correct dimBlock mistake]