Does the CPU context switching concept apply to GPU as well? When the number of threads is greater than the number of cores, context switching overhead will be generated on the GPU?
We have a GTX 259 card which has two GPU units. Each has 30 multiprocessor and 240 cores. According the the book “CUDA By Exmaples”, “the optiaml performance is achieved when the number of blocks we launch is excactly twice the number of multiprocessors our GPU contains.” (Page 176).
Does that mean we need to configuate card to run the heavy calculation kernel function using 60 blocks and 4 threads in each block for each GPU on GTX 259?
As there is also a 32 threads warp concept invloved, should we using 60 blocks and 32 threads configuration? Will this generate too much overhead from context switching?
In the 60 blocks and 4 threads configratuon, will the GPU generate extra 28 threads in each block to make at least a warp?
If the overhead is free for using more threads than the cores, will the configruation 60 blocks with 512 threads in each block have better performance according to the device query result below? Assume the memory is enough.
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
If the overhead is free for using more threads than the cores, will the configruation 60 blocks with 512 threads in each block have better performance according to the device query result below? Assume the memory is enough.
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
You would be much better served by reading Chapter 5 of the CUDA programming guide. Optimizing execution parameters requires both understanding of how the programming model works and benchmarking of your code.
You would be much better served by reading Chapter 5 of the CUDA programming guide. Optimizing execution parameters requires both understanding of how the programming model works and benchmarking of your code.