I have a GeForce 8600 GTS and I am using CUDA. I have run the deviceQuery example and it says the following:
Max threads per block: 512
Max dimension of a block: 512 x 512 x 64
Max dimension of a grid: 65535 x 65535 x 1
These are the max values and I read that any configuration greater than the available hardware will pipeline the execution. What I would like to know is what are the dimensions of my board without any pipelineing? I want to max out the boards parallelism without going into pipelining the execution to get theoretical max performance values.
Where can I read more about the architecture? For instance, it says on the website that my board has 12 multiprocessors. How many processors are inside each multiprocessor and how many hardware threads can each processor run?