How can I get the exact CPU and GPU time in NSYS NVTX profiling?

shmd.dev · June 5, 2024, 12:19pm

I just ran a profiling on LLAMA2 to identify potential bottlenecks. How do I get the exact CPU only execution times and GPU only execution times from the NSYS profiling results?

What is the difference between NVTX ranges included in the CUDA HW section and inside the python thread? Is it the same?

Is the duration for each NVTX range inclusive of the CPU and GPU execution time?

In the CUDA API section, for example does cudaMalloc take into account execution time inside the GPU?
Does it wait for the GPU to finish?
Does this give the GPU execution time for the operation?

hwilper · June 5, 2024, 8:06pm

I’m going to try to answer as many of these as I can.

Firstly, there are statistical analysis scripts available in the GUI. If you go to the drop down shown as “Events View” in your screen shot you will have a statistical analysis option. Using the existing scripts there, you should be able to get what you want. If not, you can modify those scripts as well.

Secondly, NVTX is a CPU side api. What you see on the thread is the actual time that the NVTX range was running on the CPU. The NVTX you see on the GPU timeline is a projection, basically the ranges there are a graphical representation of which of the GPU work was launched during that range. Therefore it is inclusive of the CPU on the CPU side and the GPU on the GPU side (although the GPU is less precise).

The CUDA API ranges on the CPU are the time that the CPU spent executing that code. Whether it waits for the GPU to finish depends on the API in question. The GPU time for the underlying kernels that are executed are better explained on the GPU timeline.

shmd.dev · June 5, 2024, 8:52pm

Thanks for the quick reply. I have a few more questions as I will be using these for my dissertation.

To get the exact GPU time, can we trace back all the kernel functions called inside a specific NVTX range listed in “CUDA Kernel Launch & Exec Time Trace” and sum up “Kernel Dur” or “Total Dur” values?

Does give us the exact GPU execution time for a specific NVTX range?
All the remaining time indicates that the GPU was idle if my understanding is correct i.e. GPU Idle time?

Is the “Duration” mentioned in the ‘CUDA GPU Trace’ inclusive of the CPU and GPU or is it only GPU execution time?

Also, how do we find idle GPU times and CPU only execution time? Does ‘NVTX GPU Projection Trace’ help?

hwilper · June 6, 2024, 8:24pm

If you change the dropdown in the event view to “expert systems” you will find cpu and gpu starvation rules. This will allow you to find places where the relevant hardware is idle. You can use the settings to change the length of idle that triggers being noticed here.

Cuda GPU trace knows nothing about the CPU time.

I think you might want to read this blog I wrote - https://developer.nvidia.com/blog/understanding-the-visualization-of-overhead-and-latency-in-nsight-systems/

system · June 30, 2024, 1:21pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NVTX with GPU timing? Profiling Linux Targets	9	1895	October 6, 2023
How to analyze nsight system results? Profiling Embedded Targets	2	440	November 13, 2023
What is the difference between the CUDA API and CUDA HW lines in the Nsight Systems GUI? cuDNN ai	1	26	February 28, 2025
The meaning of duration in an nvtx range Profiling Linux Targets	5	849	December 29, 2022
How to control profiling start time using Nsight System gui like --capture-range=cudaProfilerApi in cli Profiling Linux Targets nsight	12	4024	April 4, 2023
Is the profiling session duration equivalent to total runtime when using Nsight Systems? Profiling Linux Targets cuda , kernel , profiling	11	518	May 6, 2024
How to find out GPU time for executing a particular block of code? CUDA Programming and Performance	9	2001	March 28, 2024
NVIDIA Tools Extension API (NVTX): Annotation Tool for Profiling Code in Python and C/C++ Technical Blog	1	623	October 17, 2022
can Nsight show timelines of CPU execution without instrumenting code with NVTX? Nsight Eclipse Edition	1	617	February 5, 2019
How can I estimate the overall execution time of CPU and GPU separately? Profiling Linux Targets cuda , nsight	0	591	December 21, 2021

How can I get the exact CPU and GPU time in NSYS NVTX profiling?

Related topics