Hi, everyone,

I found that I cannot use the third dimension of a grid on my Tesla C2050 GPU card with Compute Capability 2.0, which means the maximum sizes of each dimension of a grid: 65535 x 65535 x 65535.

I just ran a simple test to see if I can use the 3rd dimension of a grid, like this:

//////////////////////////////////////

using namespace std;

#include <stdio.h>

#include

#include <math.h>

#include <stdlib.h>

#include <string.h>

#include <cutil_inline.h>

**global** void test(float* d_t)

{

if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==0)

d_t[0]=1.0f;

}

int main( int argc, char** argv)

{

if( cutCheckCmdLineFlag(argc, (const char**)argv, “device”) )

cutilDeviceInit(argc, argv);

else

cudaSetDevice( cutGetMaxGflopsDeviceId() );

float *d_t=NULL;
cutilSafeCall( cudaMalloc( (void**) &d_t, sizeof(float)) );

float h_t[1];

bzero(h_t,sizeof(float));

cutilSafeCall(cudaMemcpy(d_t, h_t, sizeof(float), cudaMemcpyHostToDevice) );

dim3 test_blocks(2,2,2);

dim3 test_threads(64);

test<<< test_blocks, test_threads >>>(d_t);

cutilCheckMsg(“Kernel execution failed”);

cutilSafeCall( cudaMemcpy( h_t, d_t, sizeof(float), cudaMemcpyDeviceToHost) );

printf(“h_t=%f\n”,h_t[0]);

}

//////////////////////////////////////

In kernel, if I set

if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==0)

d_t[0]=1.0f;

then I can get h_t=1; however, if I set

if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==1)

d_t[0]=1.0f;

here, the only difference is blockIdx.z==1, then I get the results h_t=0.

It seems that the third dimension of a grid should be 1 (or the blockIdx.z should always be 0), which is contradict to the specification of compute capacity 2.0, where the the maximum size of z dimension of a grid is 65535.

Does anyone have ideas about this? Thanks in advanced!