CUDA Pro Tip: Generate Custom Application Profile Timelines with NVTX

Originally published at: https://developer.nvidia.com/blog/cuda-pro-tip-generate-custom-application-profile-timelines-nvtx/

The last time you used the timeline feature in the NVIDIA Visual Profiler, Nsight VSE or the new Nsight Systems to analyze a complex application, you might have wished to see a bit more than just CUDA API calls and GPU kernels. In this post I will show you how you can use the NVIDIA…

Is it me or that all the colors in your example are transpararent?

Hi Elad, sorry for the long delay with my response. That all the colors are transparent was caused by the alpha channels set to 00 in the initial version of this post. The post and the code examples have been fixed. Thanks Jiri

Hello,

Is there a way that I can find the duration of NVTX range? I have function which contains a mix of CPU and GPU activity. Using Nsight Systems would give me the runtime of just the kernels, but I was wondering if there is any functionality in the NVTX API that can let me gather the duration of the NVTX range around this function?

Hi, NSight systems displays NVTX ranges. You might need to expand some rows to see them. In addition to that you can get some statistics also for NVTX ranges with --stats (see https://docs.nvidia.com/nsi.... NVTX does not provide an API to query the runtime of an already passed range. Hope this helps Jiri

Hi, I am trying to get the NVTX ranges from my application automatically as you explained in the post. However, the ranges are not created for all the functions. For example, my application has forward and backward FFT functions, however, there are no NVTX ranges for the forward FFTs.
Thanks in advance for your help.

Hi Vahdaneh,
interesting that you see some NVTX ranges but not all expected. Given that you see some I think you NSight Systems and NVTX setup is probably correct so I would look at the compiler instrumentation side of things. I would specifically check what happens with regards to function inlining. Some compilers disable function inlining when compiler instrumentation is used and some don’t instrument inlined functions. Often this depends on your compile flags.
Hope this helps
Jiri