block size

LightBlue · July 16, 2013, 7:48am

Hi,

I’m using GeForce GTX 690, but only using device 0 (cudaSetDevice(0)).

Somehow I am able to create blocks as big as 512x512, like following parameters:

dim3 dimBlock(512,512);
dim3 dimGrid(24,24);

The kernel launches perfectly and the results are good.

But I thought you could only have at most 1024 threads in one block, so the block size can be at most 32x32.

Can someone tell me why I can have a block as big as 512x512?

thanks in advance,
lightblue

nezix · July 16, 2013, 10:03am

The answer of that question says that the dimension of a grid is limited to 512x512x64 (Compute 1.x) and 1024x1024x64 (Compute 2.x)

http://stackoverflow.com/questions/9985912/cuda-grid-block-thread-size

Greg · July 16, 2013, 7:31pm

@LightBlue, It is not possible on compute capability 3.5 and earlier devices to launch a grid that defines a block size equal to 512 x 512 threads. The device limits are documented in This will report an error on the next cuda* function call.

The device limits are documented in the CUDA C Programming Guide Table 10. Technical Specifications per Compute Capability as the property “Maximum number of threads per block.” [url]Programming Guide :: CUDA Toolkit Documentation. The maximum for compute capability 2.* and 3.* devices is 1024 threads per block.

The CUDA C Programming Guide has a section on how to do proper error checking. See [url]Programming Guide :: CUDA Toolkit Documentation.

After reviewing your error handling you still do not think there is an error I recommend that you post a reproducible.

pasoleatis · July 17, 2013, 9:11am

If one submits a kernel with < < <blocks, tpb> > >
tpb.x<512 (1024)
tpb.y<512 (1024)
tpb.z<64
with tpb.xtpb.ytpb.z<512 (1024)

LightBlue · July 20, 2013, 6:36am

Ok, I did more testing. I wrote a simple program (as below).

If I use dimBlock(32,32), I got the correct result, A.width is changed to 10. If I use dimBlock(512,512) or dimBlock(48,48), I got wrong result, A.width is still 1 (not changed to 10).

But for both dimBlock(32,32) and dimBlock(512,512), I got “kernel launch good” message. Is that my error checking is not correct?

Thanks,
lightblue

#include <stdio.h>
#include <unistd.h>

#include <cuda.h>

typedef struct
{
int width;
int height;
} data;

global void sample_kernel(data *);

int main ( int argc, char *argv)
{

data A, *d_A;

A.width=1; A.height=2;

cudaMalloc(&d_A, sizeof(data));
cudaMemcpy(d_A, &A, sizeof(data), cudaMemcpyHostToDevice);

//invoke kernel
dim3 dimBlock(32, 32);
dim3 dimGrid(24, 24);

sample_kernel<<<dimGrid, dimBlock>>>(d_A);

{
cudaError_t cudaerr = cudaDeviceSynchronize();
if ( cudaerr!=CUDA_SUCCESS)
printf(“kernel launch failed with err "%s".\n”,cudaGetErrorString(cudaerr));
else
printf(“Kernel launch good\n”);
}

cudaMemcpy(&A, d_A, sizeof(data), cudaMemcpyDeviceToHost);

printf(“now A.width is %d\n”, A.width);
}

global void sample_kernel(data *A)
{
if ( blockIdx.x==1 && blockIdx.y==1 && threadIdx.x==1 && threadIdx.y==1 )
{
A->width=10;
A->height=20;
printf(“setting A width to 10\n”);
}
}

nezix · July 21, 2013, 10:02am

You are checking the error of the command cudaDeviceSynchronize() and not of the result of the kernel.
Just after the kernel launch you should use

if ( cudaSuccess != cudaGetLastError() )
    printf( "Error!\n" );

You could find macros here for example:
http://choorucode.com/2011/03/02/cuda-error-checking/

LightBlue · July 21, 2013, 7:24pm

thanks nezix.

Made change by calling cudaGetLastError() first and caught the error now.

I used cudaDeviceSynchronize() first thinking the kernel is executed asynchronously, so you have to sync the device first.

Apparently Cuda should have made the kernel return status to make sure we can check the launch of kernel is successful.

Thanks,
LightBlue

Topic		Replies	Views
Launching Kernel Fail CUDA Programming and Performance	15	3403	May 28, 2014
Is this Correct? CUDA Programming and Performance	5	3039	May 21, 2009
help with some cuda programming CUDA Programming and Performance	9	1818	August 31, 2009
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10146	June 19, 2011
Questions about Block and Grid CUDA Programming and Performance	4	3548	February 26, 2008
Thread Number Limitation CUDA Programming and Performance	3	3890	December 22, 2008
How to realise 3 dimension calculation? CUDA Programming and Performance	9	245	August 28, 2023
Maximum number of threads on thread block CUDA Programming and Performance	12	73741	September 21, 2023
Kernel function doesn't launch with block size >16 Block size of 4, 8, and 16 launch fine CUDA Programming and Performance	2	2875	July 28, 2008
Grid dimensions CUDA Programming and Performance	6	5506	September 18, 2009

block size

Related topics