Maximum Number of Threads

what is limit of maximum number of threads per block nVIDIA Geforce 8800 GT can have? I am running a program and it is accepting and running 1024 threads per block? Can anyone explain me this behavior.

You think you are running a kernel with 1024 threads per block, but you actually are not, either because the kernel is never running and you have no error checking to tell you the launch failed, or because you have the block size and grid size arguments in the kernel launch reversed, and you are actually running 1024 blocks with a small number of threads per block, rather than the other way.

Here is my code:

BLOCK_SIZE = 32

dim3 threads(BLOCK_SIZE*BLOCK_SIZE);

dim3 blocks(z+1,1);

mykernel<<<blocks,threads>>>(parameters);

and 32*32=1024 threads. and it is working with no error but if I increase BLOCK_SIZE to 64 then it gives error that shared memory is less.

Here is my code:

BLOCK_SIZE = 32

dim3 threads(BLOCK_SIZE*BLOCK_SIZE);

dim3 blocks(z+1,1);

mykernel<<<blocks,threads>>>(parameters);

and 32*32=1024 threads. and it is working with no error but if I increase BLOCK_SIZE to 64 then it gives error that shared memory is less.

The total number thread per block is 1024 but it still works? That is strange.

Did you check the result (parameters) after kernel function execution?

Indeed it is impossible, the number of threads should be 512, in my GTX-275 the kernel simply does not execute. The number of threads should be carefully chosen per card in order for your SMs to work at their peak.

I recommend also you use this notation:

dim3 dimGrid

dim3 dimBlock

i.e. using the notation Grid and Block which is standard to all programmers in CUDA.

Best,

Alex.