One question about using Nvidia Visual Profiler

Hi,

I have a question about how to understand the visual profiler results given by the Nvidia Visual Profiler, and hope someone could help me to figure it out!

The question lies in the timeline rows. When I was using nvvp, I found that neither the position of a certain call in the “OpenACC” row is aligned with the corresponding kernel in the “Compute” row, nor is the duration of that call in the “OpenACC” row as same long as the relevant kernel in the “Compute” row. I have some screenshots which can help me to explain this question more clearly however I do not know how to upload the figures.

Thank you very much for your help!

Best Regards,

Weicheng

Here are two links for the two pictures
https://drive.google.com/open?id=1_79meA9AE5e7hhlCpnWBrdXrJh-WjaEG

https://drive.google.com/file/d/1Wwzr9O2k9I3jZO6Lq8SmsumPV3b2HEWu/view?usp=sharing

Above I attach two figures to help what I asked about, if needed. “OpenACC_620.png” highlighting the interval in the “OpenACC” row shows a CPU thread calls OpenACC directives (starting at line 620), and “Compute_620.png” highlighting the interval in the “Compute” row shows the relevant kernel executed on the GPU. Obviously, the two are not aligned in the global timeline (actually no overlap at all), and also the two have different durations(the “OpenACC” lasts for 921.963 mu s while the “Compute” lasts for 911.379 mu s, although close). Sorry for the improper view size of the pictures, but you may need to zoom in to have a look. I just wonder why they are not aligned at all and not exactly the same.

Thank you very much!

Best,

Weicheng

Hi Weicheng,

Sorry there’s not enough information to give a good answer. Though, are you using “async”? If so, the timing may be off as to when the kernel was launched from the CPU and when it actually executed on the device.

-Mat

Hi Mat,

Yeah I am now using “async”. This makes sense to me. Thank you very much!

Best,

Weicheng

Typically it’s a good idea to disable async when profiling, unless specifically looking at asynchronous behavior.


Try setting the environment variable “PGI_ACC_SYNCHRONOUS=1”. This will disable all asynchronous kernel and data calls.

-Mat

Mat,

This is really a good idea. Thanks a lot for your help!

Best,

Weicheng