Hi, everyone,
I found that I cannot use the third dimension of a grid on my Tesla C2050 GPU card with Compute Capability 2.0, which means the maximum sizes of each dimension of a grid: 65535 x 65535 x 65535.
I just ran a simple test to see if I can use the 3rd dimension of a grid, like this:
//////////////////////////////////////
using namespace std;
#include <stdio.h>
#include
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <cutil_inline.h>
global void test(float* d_t)
{
if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==0)
d_t[0]=1.0f;
}
int main( int argc, char** argv)
{
if( cutCheckCmdLineFlag(argc, (const char**)argv, “device”) )
cutilDeviceInit(argc, argv);
else
cudaSetDevice( cutGetMaxGflopsDeviceId() );
float d_t=NULL;
cutilSafeCall( cudaMalloc( (void*) &d_t, sizeof(float)) );
float h_t[1];
bzero(h_t,sizeof(float));
cutilSafeCall(cudaMemcpy(d_t, h_t, sizeof(float), cudaMemcpyHostToDevice) );
dim3 test_blocks(2,2,2);
dim3 test_threads(64);
test<<< test_blocks, test_threads >>>(d_t);
cutilCheckMsg(“Kernel execution failed”);
cutilSafeCall( cudaMemcpy( h_t, d_t, sizeof(float), cudaMemcpyDeviceToHost) );
printf(“h_t=%f\n”,h_t[0]);
}
//////////////////////////////////////
In kernel, if I set
if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==0)
d_t[0]=1.0f;
then I can get h_t=1; however, if I set
if (threadIdx.x == 0 && blockIdx.x==0 && blockIdx.y==0 && blockIdx.z==1)
d_t[0]=1.0f;
here, the only difference is blockIdx.z==1, then I get the results h_t=0.
It seems that the third dimension of a grid should be 1 (or the blockIdx.z should always be 0), which is contradict to the specification of compute capacity 2.0, where the the maximum size of z dimension of a grid is 65535.
Does anyone have ideas about this? Thanks in advanced!