Hello, I want to identify lines of kernel code which require most of time for computation.
I am using linux version of CUDA, in CUDA 7.5: Pinpoint Performance Problems with Instruction-Level Profiling | NVIDIA Technical Blog is stated that "The Visual Profiler shows the Instruction-level profiling view when you select “Kernel Profile – PC sampling” but I cant find that option, is it possible in eclipse edition?
Also I tried to do analysis with command:
nvprof --source-level-analysis instruction_execution --kernels kernel.cu -o ~/analysis.prof my_executable
It generates analysis file, but when I open it with Visual profiler I still can’t find option to view source, any ideas? is it not possible in linux?
It requires a GPU of compute capability 5.2 or higher.
That is referenced in the article you linked:
“This powerful new feature, available on Maxwell (GM200) and later GPUs, …”
What GPU are you running on?
I haven’t tried this with the profiler built into nsight eclipse, but it should be identical to nvvp, the standalone visual profiler. I believe all the information you need to access the source-disassembly view is contained in the profiler documentation:
Here’s what I did, on CUDA 9.0 on a GTX 960 (a cc 5.2 device) on linux:
- Create a simple app for test. I used this:
$ cat t32.cu
__global__ void k(int *data){
int main(){
const int n = 5;
int *d;
cudaMalloc(&d, n*sizeof(int));
cudaMemset(d, 0, n*sizeof(int));
- compile that code, be sure to use -lineinfo switch:
nvcc -lineinfo -arch=sm_52 -o t32 t32.cu
Start the visual profiler by typing nvvp at the linux command line (you’ll need a graphical environment or an X-forwarded session)
In nvvp:
- select File…New Session
- in the dialog that opens, browse to your executable and select (double-click) it, click next
- You can probably leave these options as-is, however I turned off UM profiling, OpenACC profiling, and concurrent kernel profiling. then click Finish
You’ll now have the usual timeline view. Your guided analysis options are on the lower left. Click on “Examine GPU Usage”, then click on “Examine Individual Kernels”
The only kernel here is k. Click on the k kernel under Kernel Optimization Priorities
Now go back to the left hand Analysis pane, and click on the icon just underneath “Analysis” in the toolbar, to select “Unguided Analysis”
You should now see a set of buttons below, let’s click on Kernel Profile - PC Sampling. You’ll have to click on the graph icon to actually run this analysis.
Now you have the sampling statistics display, which is the view in figure 5 depicted in the blog you linked:
Above the pie chart, you’ll notice the kernel k indicated again. Click on it.
- Now you should see the source-disassembly view, at the top of your window, the same view that is depicted here:
and is depicted immediately after figure 5 in the blog. You should now be able to follow the blog to continue the steps indicated there.
I was wondering the same. I found the referenced article and couldn’t get very far with it. The article doesn’t really explain how to get the profiling if you’re not a nvvp expert.
However, your above post does :) so I was able to get this working. It’s really helpful, thank you.
Problem was that when I selected “Unguided analysis” then I clicked on kernel name on the left tree view not on actual timeline, so I saw only analysis options for “Application”. When I clicked on timeline, “Kernel Profile - PC Sampling” option appeared on the left list.
Thank you for your work writing detailed answer, it is helpful!