Passing too many blocks don't throw exception.

Chrischoy · October 25, 2013, 10:40pm

Hello

When I pass too many blocks kernel<<<grid(DIM,DIM),…>>> for example DIM=2000.
It doesn’t throw exception but when I retrieve the result using cudaMemcpy, it causes run-time error. How can I know how many blocks can I assign dynamically?

Thanks!

pasoleatis · October 26, 2013, 1:21pm

The number of blocks dependends on the gpu compute capability, but for all gpu it is bigger than 2000. The error comes from something else. The program crashes at cudaMemcpy because it is the next place the errors can be reported. See if you can check the errors as it is shown in this page: Google Code Archive - Long-term storage for Google Code Project Hosting.

Check your compiling options -arch=sm_xy (choose the appropriate xy for your device) and check that the number of threads per block and shared memory per block are within the limits. I suspect an out of bounds access, you can check this with cuda-memcheck tool. Use -Xptxas -v to see the resources used by each kernel.

Chrischoy · October 26, 2013, 10:07pm

Thanks pasoleatis, actually the size of the blocks that I allocated is DIM*DIM so 4,000,000 blocks. This is absurdly many blocks. But when I assign DIM=200, DIM^2 = 40,000, it works perfectly. So it is reasonable that the error comes from the kernel part not from the cudaMemcpy.
I guess I need to find another way to dynamically define maximum number of blocks that I can assign.

pasoleatis · October 26, 2013, 10:30pm

Hello,

For the all cards you can use 65000*65000 (more than 3 billion blocks). This is more than 4 millions. On Kepler cards you can use 2 million * 65000. The number of blocks you use is very small. If it works for DIM=200 (DIM^2=40000), but not for Dim=2000 (DIM^2=4,000,000), you have a bug in the kernel or the number of threads is wrong.

If you look at wikipedia cuda page there is a table with the number of blocks for each generation of gpu. (CUDA - Wikipedia).

Chrischoy · October 27, 2013, 12:25am

Yep. I found out that my lab computer has version 1.0 graphic card
Maximum x- or y-dimension of a block 512 (CUDA - Wikipedia)
That explains why it works up to 500 dim for x and y dimension and throw runtime error when its more than 500. I need to upgrade the graphic card!

Thanks!

pasoleatis · October 27, 2013, 8:05am

Hello,

I think you misunderstood something. Also on the 1.0 you can submit millions of blocks. What you are refereeing is the number of threads per block. Different things.

This means that if try to run a kernel:
kernel<<<blocks,threads>>>();
blocks.x,y,z can be up to 65000
while threads.x<=512 threads.y<=512 and threads.z<=64 with threads.xthreads.ythreads.z<=512

Topic		Replies	Views
Tesla K80 limit of number of blocks in one dimension CUDA Programming and Performance	2	1054	March 10, 2018
A question for block dimension on cuda C programming CUDA Programming and Performance	2	890	November 18, 2015
Weird behavior of CUDA CUDA Programming and Performance	6	5636	February 13, 2008
Threads and blocks concept question Invoking a kernel CUDA Programming and Performance	2	1711	December 5, 2007
Thread Scheduling / Limit maximum threads per block in each dimension vs Maximum thread on a SM CUDA Programming and Performance	3	1820	June 21, 2012
Сan`t understand what grid dimension to use (cudaDeviceSynchronize error code 4) CUDA Programming and Performance	1	620	February 2, 2018
MAximum block per grid CUDA Programming and Performance	8	5996	April 18, 2011
How to run with large number of blocks? CUDA Programming and Performance	6	786	June 1, 2011
Launching Kernel Fail CUDA Programming and Performance	15	3575	May 28, 2014
kernel invocation parameters CUDA Programming and Performance	2	861	January 26, 2015

Passing too many blocks don't throw exception.

Related topics