I have gone through “CUDA by Example”.
But I am still confused in some concepts.
I can add elements of two matrices in parallel at the same time.
Can I do matrix addition and a different operation, image contrast enhancement at the same time?
I got a feeling that If I have 2 cores, then I can do two entirely different applications in parallel. Is that right?
Also I feel that “core contains several grids and grid contains several blocks which in turn contain several threads”?
Is that right?
You don’t need to think about cores when you are doing CUDA programming. Number of threads that is actually executed in parallel equals to the warp size. As of Compute 2.0+, the warp size is 32. To take advantage of this, you should make your threads per block multiples of 32.
Number of threads per grid determines how many threads you want to share the same kernel code. If you want your threads to do different things, you should separate them into different kernels.
This is not true. Each MP can execute many warps concurrently. On Fermi, there are two schedulers each issuing two warps every two clocks (not the same as one warp per clock). On Kepler, there are 4 warp schedulers, each of which can issue two instructions on the same warp per clock.
The number of threads executing concurrently (for most definitions of “concurrently”) on a CUDA device is generally much larger than the number of CUDA cores.
In general, you want way more threads than CUDA cores to maximize throughput. For example, on the GTX 580, you generally want a grid that has at least a couple thousand threads, if not more.
My other advice is to not draw analogies between CUDA programming and multithreaded programming on the CPU. A CUDA core is nothing like a CPU core, and a CUDA thread is not the same as a CPU thread. Read the first few chapters of the CUDA programming guide. They are a very good introduction to the basic concepts.