Visual profiler

Hi all,

I am trying to figure out the capabilities of Compute Visual Profiler of NVidia and I have some questions though:

  1. Can we drill down the kernel code or any kind of code? For example, if a kernel takes 8000 gpu cycles to execute, does the tool support further analysis in order to find out which specific instruction of the kernel is the most gpu consuming?

  2. Does the profiler extract information from mixed code (host + device) or only from pure CUDA code?

  3. Does the profiler support analysis for pure host code? For example, if we have cudaMalloc function, will the profiler show the internal system calls which probably are executed in the host side?

All I’ve seen so far from the Internet and the documentation is that the tool provides mostly numerical statistics and it does not analyze the code thoroughly. i.e. The user will understand for example that a allocation function takes, let’s say, 400 cycles, but he can’t find out which specific instruction or system call from the allocation function is the most time consuming!

Thanks in advance

P.S. Please, please I am new to CUDA and I need guidance! Please help me!

nobody??? External Image External Image External Image