My GPU can run 512 threads per blocks, and has 512 x 512 x 64 blocks dimension. So I run kernel in one dimension like this:
<<<numblocks, threadsperblock>>>
maximum value if numblocks is 512, right?
But my code run with much higher values and gives correct results. Why is this happening?
Below is code of kernel if it help. It write every line number as binary code in 1D array (it’s represent of 2D matrix). Now I run it with few threads per block so id is compute from:
int id = threadIdx.x + blockDim.x * blockIdx.x;
but at start (when I saw problem) I run it with one thread per block so id was:
int id = blockIdx.x;
__global__ void writeCubeInBinaryForm(bool *dataArray, int n, long long int twoPowerN,
long long int arraySize)
{
int id = threadIdx.x + blockDim.x * blockIdx.x; //blockIdx.x;
int idx,bufor=id;
for (int i=0; i < n; i++)
{
idx = i + id * n;
dataArray[idx] = bufor % 2;
bufor *= 0.5;
}
}