A question about 3D grid

Hi, everyone,

I found that I cannot use the third dimension of a grid on my Tesla C2050 GPU card with Compute Capability 2.0, which means the maximum sizes of each dimension of a grid: 65535 x 65535 x 65535.

I just ran a simple test to see if I can use the 3rd dimension of a grid, like this:

//////////////////////////////////////
using namespace std;

#include <stdio.h>
#include
#include <math.h>
#include <stdlib.h>
#include <string.h>

#include <cutil_inline.h>

global void test(float* d_t)
{
if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==0)
d_t[0]=1.0f;
}

int main( int argc, char** argv)
{
if( cutCheckCmdLineFlag(argc, (const char**)argv, “device”) )
cutilDeviceInit(argc, argv);
else
cudaSetDevice( cutGetMaxGflopsDeviceId() );

float d_t=NULL;
cutilSafeCall( cudaMalloc( (void
*) &d_t, sizeof(float)) );

float h_t[1];
bzero(h_t,sizeof(float));
cutilSafeCall(cudaMemcpy(d_t, h_t, sizeof(float), cudaMemcpyHostToDevice) );

dim3 test_blocks(2,2,2);
dim3 test_threads(64);

test<<< test_blocks, test_threads >>>(d_t);
cutilCheckMsg(“Kernel execution failed”);

cutilSafeCall( cudaMemcpy( h_t, d_t, sizeof(float), cudaMemcpyDeviceToHost) );

printf(“h_t=%f\n”,h_t[0]);

}
//////////////////////////////////////

In kernel, if I set

if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==0)
d_t[0]=1.0f;

then I can get h_t=1; however, if I set

if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==1)
d_t[0]=1.0f;

here, the only difference is blockIdx.z==1, then I get the results h_t=0.

It seems that the third dimension of a grid should be 1 (or the blockIdx.z should always be 0), which is contradict to the specification of compute capacity 2.0, where the the maximum size of z dimension of a grid is 65535.

Does anyone have ideas about this? Thanks in advanced!

BTW, I ran the GPU code on CentOS 5.5, and the compiling command I used is:
nvcc test3Dgrid.cu -o t3g_gpu -I /home/huangchao/NVIDIA_GPU_Computing_SDK/C/common/inc -lcutil_x86_64 -L /home/huangchao/NVIDIA_GPU_Computing_SDK/C/lib

Chao

You need to add -arch sm_20

I added the -arch sm_20 into my compile line, which looks like:

nvcc test3Dgrid.cu -o t3g_gpu -I /home/huangchao/NVIDIA_GPU_Computing_SDK/C/common/inc -lcutil_x86_64 -L /home/huangchao/NVIDIA_GPU_Computing_SDK/C/lib -arch=sm_20

But it still didn’t work, which means if I set

if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==1)

d_t[0]=1.0f;

I still got the results h_t=0.

Is there still something I am missing?

Which CUDA version are you using?

After I upgrade my CUDA driver, toolkit and SDK to the latest version, and add -arch sm_20 into compile line, then it works. Thank you very much for you help! I really appreciate it!