Hello, I want to identify lines of kernel code which require most of time for computation.
I am using linux version of CUDA, in CUDA 7.5: Pinpoint Performance Problems with Instruction-Level Profiling | NVIDIA Technical Blog is stated that "The Visual Profiler shows the Instruction-level profiling view when you select “Kernel Profile – PC sampling” but I cant find that option, is it possible in eclipse edition?
Also I tried to do analysis with command:
nvprof --source-level-analysis instruction_execution --kernels kernel.cu -o ~/analysis.prof my_executable
It generates analysis file, but when I open it with Visual profiler I still can’t find option to view source, any ideas? is it not possible in linux?
It requires a GPU of compute capability 5.2 or higher.
That is referenced in the article you linked:
“This powerful new feature, available on Maxwell (GM200) and later GPUs, …”
What GPU are you running on?
I haven’t tried this with the profiler built into nsight eclipse, but it should be identical to nvvp, the standalone visual profiler. I believe all the information you need to access the source-disassembly view is contained in the profiler documentation:
https://docs.nvidia.com/cuda/profiler-users-guide/index.html#source-assembly-view
Here’s what I did, on CUDA 9.0 on a GTX 960 (a cc 5.2 device) on linux:
- Create a simple app for test. I used this:
$ cat t32.cu
__global__ void k(int *data){
data[threadIdx.x]++;
}
int main(){
const int n = 5;
int *d;
cudaMalloc(&d, n*sizeof(int));
cudaMemset(d, 0, n*sizeof(int));
k<<<1,n>>>(d);
cudaDeviceSynchronize();
}
- compile that code, be sure to use -lineinfo switch:
nvcc -lineinfo -arch=sm_52 -o t32 t32.cu
-
Start the visual profiler by typing nvvp at the linux command line (you’ll need a graphical environment or an X-forwarded session)
-
In nvvp:
- select File…New Session
- in the dialog that opens, browse to your executable and select (double-click) it, click next
- You can probably leave these options as-is, however I turned off UM profiling, OpenACC profiling, and concurrent kernel profiling. then click Finish
-
You’ll now have the usual timeline view. Your guided analysis options are on the lower left. Click on “Examine GPU Usage”, then click on “Examine Individual Kernels”
-
The only kernel here is k. Click on the k kernel under Kernel Optimization Priorities
-
Now go back to the left hand Analysis pane, and click on the icon just underneath “Analysis” in the toolbar, to select “Unguided Analysis”
https://docs.nvidia.com/cuda/profiler-users-guide/index.html#unguided-analysis
-
You should now see a set of buttons below, let’s click on Kernel Profile - PC Sampling. You’ll have to click on the graph icon to actually run this analysis.
-
Now you have the sampling statistics display, which is the view in figure 5 depicted in the blog you linked:
https://devblogs.nvidia.com/cuda-7-5-pinpoint-performance-problems-instruction-level-profiling/
Above the pie chart, you’ll notice the kernel k indicated again. Click on it.
- Now you should see the source-disassembly view, at the top of your window, the same view that is depicted here:
https://docs.nvidia.com/cuda/profiler-users-guide/index.html#source-assembly-view
and is depicted immediately after figure 5 in the blog. You should now be able to follow the blog to continue the steps indicated there.
I was wondering the same. I found the referenced article and couldn’t get very far with it. The article doesn’t really explain how to get the profiling if you’re not a nvvp expert.
However, your above post does :) so I was able to get this working. It’s really helpful, thank you.
Paul
Problem was that when I selected “Unguided analysis” then I clicked on kernel name on the left tree view not on actual timeline, so I saw only analysis options for “Application”. When I clicked on timeline, “Kernel Profile - PC Sampling” option appeared on the left list.
Thank you for your work writing detailed answer, it is helpful!