I have a question about how to understand the visual profiler results given by the Nvidia Visual Profiler, and hope someone could help me to figure it out!
The question lies in the timeline rows. When I was using nvvp, I found that neither the position of a certain call in the “OpenACC” row is aligned with the corresponding kernel in the “Compute” row, nor is the duration of that call in the “OpenACC” row as same long as the relevant kernel in the “Compute” row. I have some screenshots which can help me to explain this question more clearly however I do not know how to upload the figures.
Thank you very much for your help!
Here are two links for the two pictures
Above I attach two figures to help what I asked about, if needed. “OpenACC_620.png” highlighting the interval in the “OpenACC” row shows a CPU thread calls OpenACC directives (starting at line 620), and “Compute_620.png” highlighting the interval in the “Compute” row shows the relevant kernel executed on the GPU. Obviously, the two are not aligned in the global timeline (actually no overlap at all), and also the two have different durations(the “OpenACC” lasts for 921.963 mu s while the “Compute” lasts for 911.379 mu s, although close). Sorry for the improper view size of the pictures, but you may need to zoom in to have a look. I just wonder why they are not aligned at all and not exactly the same.
Thank you very much!
Sorry there’s not enough information to give a good answer. Though, are you using “async”? If so, the timing may be off as to when the kernel was launched from the CPU and when it actually executed on the device.
Yeah I am now using “async”. This makes sense to me. Thank you very much!
Typically it’s a good idea to disable async when profiling, unless specifically looking at asynchronous behavior.
Try setting the environment variable “PGI_ACC_SYNCHRONOUS=1”. This will disable all asynchronous kernel and data calls.
This is really a good idea. Thanks a lot for your help!