Limit on kernel block / grid numbers?

SrJsignal · August 16, 2007, 9:54pm

I’m working on a few fairly simple data conversion kernels in my code, I’ve got both of them to work find for small data sets, but when I scale up, the output gets all kinds of messed up.

For example: I’m operating on memory that is num_samples in size (we’ll call the data sets data1 and data2

Each “thread” of the function is independant of any others, so I don’t really care what order they get executed in, as long as the input (data1) and output (data2) are in the same order.

I call the function like this:

my_function <<< (num_samples/256) , 256, 0 >>> ( data1, data2 );

if num_samples <= 8388608 everything works fine, when num_samples > 8388608 it doesn’t work (admittedly I’ve only tried with 16M not 8M+1). I don’t get errors or anything, the data is just wrong.

I’ve looked through the documentation constantly this week and haven’t really found anything that really mentions any kind of limits that I’d be running into on this. (the max kernel time for 16M should be ~30ms or so MAX).

Thanks,

mfatica · August 16, 2007, 10:04pm

The maximum size for each dimension in the grid is 2^16-1=65535.
If you are using a 1D grid and 256 threads per block, you can only process
65535*256=16776960 elements if you are using a 1:1 mapping between element position and threadid.
Look at the Black-Scholes example for a way to handle generic size arrays.

SrJsignal · August 17, 2007, 12:45pm

Ahh, that would be why it’s not working, thanks I didn’t see that documented specifically.

Topic		Replies	Views
hitting the grid size limitation CUDA Programming and Performance	5	1569	November 13, 2009
MAximum block per grid CUDA Programming and Performance	8	6003	April 18, 2011
Max blocks per grid CUDA Programming and Performance	3	14766	August 3, 2009
maximum total number of threads for kernel Maximum allowed number of blocks in grid CUDA Programming and Performance	2	4139	August 10, 2007
Probably a simple answer Simple CUDA code - unexpected result CUDA Programming and Performance	7	4963	October 27, 2010
Need help understanding kernel function, grid and block CUDA Programming and Performance	6	661	October 12, 2021
Size limitation for 1D Arrays in CUDA? CUDA Programming and Performance	9	18539	October 17, 2013
Grid dimensions CUDA Programming and Performance	6	5759	September 18, 2009
Newbie question about maximum number of blocks CUDA Programming and Performance	1	676	March 26, 2016
String search with many threads CUDA Programming and Performance	11	6052	November 5, 2010

Limit on kernel block / grid numbers?

Related topics