Understanding number of threads Problems with program working

Hello All,

My program works only if i have 256 threads. Period. Otherwise it stops working. I am unsure why. Length of my array is 839.

Maybe you have some ideas?

There are a number of resource limits which will prevent a kernel from being run - register usage, shared memory usage and threads per block. Appendix A of the programming guide describes the limits for the different generations of hardware which can run CUDA.


Thanks for the quick reply. I run Quadro core FX1600m and specification says that the number of threads per block is 512. Therefore I increased number of threads per block to 512.

const int extents= iextentjextentkextent;
dim3 dimGrid(extents);
I invoke my kernel in the following way myKernel<<<dimGrid,512>>>(parameters)…the kernel does not get invoked. If it is myKernel<<<dimGrid,256>>>(parameters), it gets invoked.

Hoping to hear back from you.

Thank you!

It actually says the maximum number of threads per block is 512. It also says that your card has limit of 8192 registers per multiprocessor. If you kernel uses more than 16 registers, then your kernel won’t lauch.

Which I guess means your kernel is using somewhere between 17 and 32 registers. Section 5.2 of the programming guide explains how execution configuration is calculated and how to get the amount of resources a kernel uses from the compiler and how to force the compiler to control the number of registers it selects during compilation.