problem with 3 dimensional thread block with a three dimensional grid, the kernel was not executed

JKPie · February 8, 2011, 2:22pm

Hello,

I need to process data stored in three dimensional array.

When I did it plane by plane there was no problem:

texture< uchar, 3, cudaReadModeNormalizedFloat > tex;

const dim3 blockSize( 16, 16, 1 );

__global__ void my_kenrel( float *d_output, uint imageW, uint imageH, uint imageD, uint z )

{

	uint x = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;

        uint y = __umul24(blockIdx.y, blockDim.y) + threadIdx.y;

if ((x >= imageW) || (y >= imageH) || (z >= imageD) ) return; 

.....

.....

    uint i = z * imageW * imageH + y * imageW + x;

    d_output[ i ] = something;

}

extern "C"

{

	int run_kernel( cudaArray * dInArray, void * dOutData, int nWidth, int nHeight, int nDepth )

	{

           dim3 gridSize( nWidth / blockSize.x + 1, nHeight / blockSize.y + 1, 1 );	

           for (int z = 0; z < nDepth; z++ )

		{

			my_kenrel <<< gridSize, blockSize >>> ( (float *)( dOutData ), nWidth, nHeight, nDepth, z );

		}

        }

}

however, when I tried to use three dimensional grid, the kernel was not executed:

texture< uchar, 3, cudaReadModeNormalizedFloat > tex;

const dim3 blockSize( 8, 8, 8 );

__global__ void my_kenrel( float *d_output, uint imageW, uint imageH, uint imageD )

{

	uint x = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;

        uint y = __umul24(blockIdx.y, blockDim.y) + threadIdx.y;

        uint z = __umul24(blockIdx.z, blockDim.z) + threadIdx.z;

if ((x >= imageW) || (y >= imageH) || (z >= imageD) ) return; 

.....

.....

    uint i = z * imageW * imageH + y * imageW + x;

    d_output[ i ] = something;

}

extern "C"

{

	int run_kernel( cudaArray * dInArray, void * dOutData, int nWidth, int nHeight, int nDepth )

	{

           dim3 gridSize( nWidth / blockSize.x + 1, nHeight / blockSize.y + 1, nDepth / blockSize.z + 1 );	

	   my_kenrel <<< gridSize, blockSize >>> ( (float *)( dOutData ), nWidth, nHeight, nDepth );

}

}

I do not understand the problem. I have heard, that maximum block size for the GTX480 is 1024, but in both cases block size is smaller.

Are there also any grid size restrictions?

Thank you for your help.

Jakub

DrAnderson42 · February 8, 2011, 3:04pm

Yeah, there are other restrictions, such as the total register resources must be greater than or equal to the number of registers per thread times the number of threads per block. Compiling with --ptxas-options -v can tell you the register usage.

The easiest way to tell why it is not launching is to check the error codes after the launch. Call cudaThreadSynchronize, then cudaGetLastError and translate it with cudaGetErrorString. Knowing what the error code is can help narrow down the root cause.

tera · February 8, 2011, 3:34pm

The most important restriction in this context is that gridSize.z may not be greater than one…

JKPie · February 9, 2011, 2:04pm

Thank you a lot for useful hints.

There is everything ok with registers. After kernel execution cudaGetLastError() returns “invalid configuration argument”.

JKPie · February 9, 2011, 2:10pm

Sad, but true. I used to work with 2D images and it’s my first time with 3D arrays, so I thought that the methodology is the same.

It is a little strange for me, that gridSize is a three dimensional vector, but there is only one correct value of 3rd coordinate.

Thanks for help.

pium · February 9, 2011, 4:20pm

Hi,

if I well remember the maximum grid size is 65535 and it is quickly reached on a 3D grid.

–pium

DrAnderson42 · February 10, 2011, 12:45pm

The fermi hardware is actually capable of 3D grids, the software side compatibility still hasn’t been added into CUDA, though.

Topic		Replies	Views
Dimensions of a Block and a Grid CUDA Programming and Performance	7	13135	May 1, 2008
The 3rd dimension can't be greater than 1? CUDA Programming and Performance	8	7430	September 23, 2011
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10326	June 19, 2011
grid dimensionality kernels CUDA Programming and Performance	11	10598	May 29, 2008
A question about 3D grid CUDA Programming and Performance	6	2424	December 16, 2011
Grid dimensions CUDA Programming and Performance	6	5706	September 18, 2009
hitting the grid size limitation CUDA Programming and Performance	5	1541	November 13, 2009
block size CUDA Programming and Performance	6	5966	July 21, 2013
3 D thread blocks in a cuda kernel CUDA Programming and Performance	4	1262	March 13, 2013
Whats wrong with this simple kernel call? Invalid Configuration Argument (with empty Kernel) CUDA Programming and Performance	16	9183	November 23, 2009

problem with 3 dimensional thread block with a three dimensional grid, the kernel was not executed

Related topics