solved: instruction level sampling in visual profiler (eclipse edition), not available?

Kestutis · November 3, 2018, 9:08am

Hello, I want to identify lines of kernel code which require most of time for computation.
I am using linux version of CUDA, in CUDA 7.5: Pinpoint Performance Problems with Instruction-Level Profiling | NVIDIA Technical Blog is stated that "The Visual Profiler shows the Instruction-level profiling view when you select “Kernel Profile – PC sampling” but I cant find that option, is it possible in eclipse edition?
Also I tried to do analysis with command:

nvprof --source-level-analysis instruction_execution --kernels kernel.cu -o ~/analysis.prof my_executable

It generates analysis file, but when I open it with Visual profiler I still can’t find option to view source, any ideas? is it not possible in linux?

Robert_Crovella · November 3, 2018, 1:44pm

It requires a GPU of compute capability 5.2 or higher.

That is referenced in the article you linked:

“This powerful new feature, available on Maxwell (GM200) and later GPUs, …”

What GPU are you running on?

Kestutis · November 3, 2018, 5:57pm

I am running 5.2

Robert_Crovella · November 3, 2018, 10:19pm

I haven’t tried this with the profiler built into nsight eclipse, but it should be identical to nvvp, the standalone visual profiler. I believe all the information you need to access the source-disassembly view is contained in the profiler documentation:

https://docs.nvidia.com/cuda/profiler-users-guide/index.html#source-assembly-view

Here’s what I did, on CUDA 9.0 on a GTX 960 (a cc 5.2 device) on linux:

Create a simple app for test. I used this:

$ cat t32.cu
__global__ void k(int *data){

  data[threadIdx.x]++;
}

int main(){
  const int n = 5;
  int *d;
  cudaMalloc(&d, n*sizeof(int));
  cudaMemset(d, 0, n*sizeof(int));
  k<<<1,n>>>(d);
  cudaDeviceSynchronize();
}

compile that code, be sure to use -lineinfo switch:

nvcc -lineinfo -arch=sm_52 -o t32 t32.cu

Start the visual profiler by typing nvvp at the linux command line (you’ll need a graphical environment or an X-forwarded session)
In nvvp:

select File…New Session
in the dialog that opens, browse to your executable and select (double-click) it, click next
You can probably leave these options as-is, however I turned off UM profiling, OpenACC profiling, and concurrent kernel profiling. then click Finish

You’ll now have the usual timeline view. Your guided analysis options are on the lower left. Click on “Examine GPU Usage”, then click on “Examine Individual Kernels”
The only kernel here is k. Click on the k kernel under Kernel Optimization Priorities
Now go back to the left hand Analysis pane, and click on the icon just underneath “Analysis” in the toolbar, to select “Unguided Analysis”

https://docs.nvidia.com/cuda/profiler-users-guide/index.html#unguided-analysis

You should now see a set of buttons below, let’s click on Kernel Profile - PC Sampling. You’ll have to click on the graph icon to actually run this analysis.
Now you have the sampling statistics display, which is the view in figure 5 depicted in the blog you linked:

https://devblogs.nvidia.com/cuda-7-5-pinpoint-performance-problems-instruction-level-profiling/

Above the pie chart, you’ll notice the kernel k indicated again. Click on it.

Now you should see the source-disassembly view, at the top of your window, the same view that is depicted here:

https://docs.nvidia.com/cuda/profiler-users-guide/index.html#source-assembly-view

and is depicted immediately after figure 5 in the blog. You should now be able to follow the blog to continue the steps indicated there.

LordLucan · November 4, 2018, 9:25am

I was wondering the same. I found the referenced article and couldn’t get very far with it. The article doesn’t really explain how to get the profiling if you’re not a nvvp expert.

However, your above post does :) so I was able to get this working. It’s really helpful, thank you.

Paul

Kestutis · November 5, 2018, 6:07pm

Problem was that when I selected “Unguided analysis” then I clicked on kernel name on the left tree view not on actual timeline, so I saw only analysis options for “Application”. When I clicked on timeline, “Kernel Profile - PC Sampling” option appeared on the left list.

Thank you for your work writing detailed answer, it is helpful!

Topic		Replies	Views
Nvvp instruction level profiling: source-file mappings missing from the kernel CUDA Programming and Performance	7	2797	December 19, 2019
Profiling GPU at source code level CUDA Programming and Performance	4	533	November 9, 2024
Instruction-Level Profiling via nvprof? CUDA Programming and Performance	0	1257	January 21, 2016
Can't see the source code in NVVP Visual Profiler and nvprof	4	813	January 2, 2025
Visual Profiler for CUDA 4.1 CUDA Programming and Performance	22	8462	September 1, 2016
CUDA 7.5: Pinpoint Performance Problems with Instruction-Level Profiling Technical Blog	14	702	April 13, 2018
Issues about CudaProfiler analysis Gpu Idle, missing kernel analysis topics CUDA Programming and Performance	2	7828	June 22, 2011
cudav3.2 compute visual profiler under ubuntu 10.04 seeking help to resolve configuation issuein com CUDA Programming and Performance	9	1092	January 14, 2011
Nsight Eclipse profile error Nsight Eclipse profile error in ubuntu CUDA Programming and Performance	5	3718	June 25, 2012
preview of NVIDIA Visual Profiler CUDA Programming and Performance	76	88911	May 18, 2010

solved: instruction level sampling in visual profiler (eclipse edition), not available?

Related topics