My project contains 2 kernels along with 2 callback host functions. I am running my kernels on GTX930mx (integrated).
I have some question regarding the profiler output.
- In the kernel latency section, there is pie chart of Stall Reasons, What the “other” factor in the chart indicates. Here is my stall Reason pie chart
https://www.dropbox.com/s/h4u0nvt3xia2atf/pie%20chart.png?dl=0
- My execution configuration is the following:
Threads per Block = 512
Blocks per Grid = 33
Total Number of SM's of my GPU = 3
The profiler prompt the following:
The Achieved active warps is 54 in my case and the device impose limit is 64. As far as i understood it, The 54 is because of the reason that if one block is more to be active, it will increase the warp limit i.e. 64 because 512/Warp_Size = 16 and 54+16 > 64. Here is the screenshot
https://www.dropbox.com/s/rcotmvof7t4qyqm/tmp.png?dl=0
- Multiprocessor utilization shows that my SM’s are utilized to about ~90%, Is that a good sign to know that i’m about to reach the limit of my GPU capability ?
https://www.dropbox.com/s/282dbdn9xvilw8x/SM2.png?dl=0
I am open to any suggestion.
P.S: Excuse my English, please.