A question about 3D grid

chaohuang · December 15, 2011, 11:10pm

Hi, everyone,

I found that I cannot use the third dimension of a grid on my Tesla C2050 GPU card with Compute Capability 2.0, which means the maximum sizes of each dimension of a grid: 65535 x 65535 x 65535.

I just ran a simple test to see if I can use the 3rd dimension of a grid, like this:

//////////////////////////////////////
using namespace std;

#include <stdio.h>
#include
#include <math.h>
#include <stdlib.h>
#include <string.h>

#include <cutil_inline.h>

global void test(float* d_t)
{
if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==0)
d_t[0]=1.0f;
}

int main( int argc, char** argv)
{
if( cutCheckCmdLineFlag(argc, (const char**)argv, “device”) )
cutilDeviceInit(argc, argv);
else
cudaSetDevice( cutGetMaxGflopsDeviceId() );

float d_t=NULL;
cutilSafeCall( cudaMalloc( (void*) &d_t, sizeof(float)) );

float h_t[1];
bzero(h_t,sizeof(float));
cutilSafeCall(cudaMemcpy(d_t, h_t, sizeof(float), cudaMemcpyHostToDevice) );

dim3 test_blocks(2,2,2);
dim3 test_threads(64);

test<<< test_blocks, test_threads >>>(d_t);
cutilCheckMsg(“Kernel execution failed”);

cutilSafeCall( cudaMemcpy( h_t, d_t, sizeof(float), cudaMemcpyDeviceToHost) );

printf(“h_t=%f\n”,h_t[0]);

}
//////////////////////////////////////

In kernel, if I set

if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==0)
d_t[0]=1.0f;

then I can get h_t=1; however, if I set

if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==1)
d_t[0]=1.0f;

here, the only difference is blockIdx.z==1, then I get the results h_t=0.

It seems that the third dimension of a grid should be 1 (or the blockIdx.z should always be 0), which is contradict to the specification of compute capacity 2.0, where the the maximum size of z dimension of a grid is 65535.

Does anyone have ideas about this? Thanks in advanced!

chaohuang · December 15, 2011, 11:15pm

More info:

I ran the GPU code on CentOS 5.5, and the compiling command I used is:
nvcc test3Dgrid.cu -o t3g_gpu -I /home/huangchao/NVIDIA_GPU_Computing_SDK/C/common/inc -lcutil_x86_64 -L /home/huangchao/NVIDIA_GPU_Computing_SDK/C/lib

Chao

tera · December 16, 2011, 12:59am

Which CUDA version are you using?

chaohuang · December 16, 2011, 2:43am

Hi, Tera, thanks for your reply, and from the deviceQuery, it gave:

Device 0: “Tesla C2050”

CUDA Driver Version: 4.0

CUDA Runtime Version: 3.20

CUDA Capability Major/Minor version number: 2.0

Total amount of global memory: 2817720320 bytes

Multiprocessors x Cores/MP = Cores: 14 (MP) x 32 (Cores/MP) = 448 (Cores)

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Clock rate: 1.15 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Concurrent kernel execution: Yes

Device has ECC support enabled: Yes

Device is using TCC driver mode: No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 3.20, NumDevs = 1, Device = Tesla C2050

pQB · December 16, 2011, 8:02am

Just to be sure, add -arch=sm_20 when you compile. nvcc compile for compute capability 1.0 by default, which does not support 3D grids of blocks.

tera · December 16, 2011, 9:58am

You need to install CUDA runtime 4.0.

chaohuang · December 16, 2011, 8:19pm

After I upgrade my CUDA driver, toolkit and SDK to the latest version, and add -arch sm_20 into compile line, then it works.

Thank you guys for your help! I really appreciate it!

Topic		Replies	Views
A question about 3D grid CUDA Programming and Performance	4	4126	December 16, 2011
3D Grids dimension sizes TeslaC2050, Cuda Driver 3.2 only 2d grids possible CUDA Programming and Performance	1	1692	August 18, 2011
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10124	June 19, 2011
How determine max number of blocks and threads for a GPU? CUDA Programming and Performance	4	20571	December 13, 2018
Maximum number of threads on thread block CUDA Programming and Performance	12	72958	September 21, 2023
grid dimensionality kernels CUDA Programming and Performance	11	10482	May 29, 2008
Dimensions of a Block and a Grid CUDA Programming and Performance	7	12982	May 1, 2008
Grid dimensions CUDA Programming and Performance	6	5455	September 18, 2009
problem with 3 dimensional thread block with a three dimensional grid, the kernel was not executed CUDA Programming and Performance	6	5960	February 10, 2011
Problems with maximum grid dimension CUDA Programming and Performance	2	618	October 16, 2018

A question about 3D grid

Related topics