How can I obtain the no. of cores in GPU are used currently? Any instruction? or example?
While you can’t tell the number of GPUs in use, you can tell the number of GPUs available via the runtime routine “acc_get_num_device” when using the PGI Accelerator Model and “cudaGetDeviceCount” in CUDA Fortran.
Hope this helps,
Thanks for your help. However I think my question is that I wanna know the number of processors used currently but not the device (GPUs). Like the parallel in shared memory, we can set how many cores we wanna use. Can I do the same thing wile using GPU??
I think I understand what your asking for, but it’s the wrong question to be asking since SMP parallel programming doesn’t apply to a GPU. The question you should be asking is how do I tell the “Occupancy” of my program? Occupancy is defined as “the ratio of active warps to the maximum number of warps supported on a multiprocessor of the GPU.” In other words it’s how well your program is keeping all the cores busy. Low occupancy usually leads to lower performance, but high occupancy does not necessarily lead to high performance. A web search of “occupancy CUDA” will provide more details.
In CUDA programming the user does not control the number of cores being used, rather you control the number of threads created. What you want is to have enough threads in a thread block so that when one warp stalls due to a memory fetch or other operation (a warp is group of 32 threads), another warp can swapped in. So having a block size of 64, 128, or even 512 is usually better. However, your block size can be limited by register and shared memory usage. The more memory used per thread, the fewer threads you can have. In addition to the block size, you want enough blocks to fully populate all the Streaming Multiprocessors.
Michael Wolfe wrote a good concise article about the CUDA Data Parallel threading model which you might find helpful: http://www.pgroup.com/lit/articles/insider/v2n1a5.htm
So how do you tell the Occupancy? If you’re using the PGI Accelerator Model, the compiler will list the occupancy in the informational output (-Minfo).
For CUDA Fortran, you can use the CUDA Occupancy Calculator (http://news.developer.nvidia.com/2007/03/cuda_occupancy_.html) or using the CUDA Profiler (i.e set CUDA_PROFILE=1 in your environment, run the program, and review the resulting cuda_profile_0.log file) Note the number of registers used can found using the “-Mcuda=ptxinfo” flag.
Hope this helps,