Hi,
I’m trying to use profiling information of OpenCL kernels. I set OPENCL_PROFILE=1 and made a config.txt file containing the performance counters that I want to see. And set OPENCL_PROFILE_CONFIG=config.txt
I also set build option to “-cl-nv-verbose”
But there are only a few features to be profiled, such as occupancy, timestamp, divergent_branch.
The followings are not profiled. Could anyone tell me how to make them work in OpenCL (without having to use the visual profiler - I only want to see through text files)?
NV_Warning: Ignoring the invalid profiler config option: regperworkitem
NV_Warning: Ignoring the invalid profiler config option: workgroupsize
NV_Warning: Ignoring the invalid profiler config option: regperworkitem
NV_Warning: Can’t monitor multi bus-width signal branch in this run
NV_Warning: Signal branch can not be profiled in this run.
NV_Warning: Signal gld_request can not be profiled in this run.
NV_Warning: Signal gst_request can not be profiled in this run.
NV_Warning: Ignoring the invalid profiler config option: instructions
NV_Warning: Ignoring the invalid profiler config option: warp_serialize
Where did you get your keywords from? Unfortunately, the column titles in the Visual Profiler differ from the keywords in the configuration file, and even the documentation in /usr/local/cuda/doc/Compute_Profiler.txt has at least two errors. If unsure, you may use the Visual Profiler to export an CSV file of a run and at look at the column titles there, those refer to the configuration file keywords again.
Moreover, you cannot measure arbitrary combinations of events at the same time, so usually you will have to perform multiple runs of your application measuring a limited set of events in each (that’s one of the reasons the Visual Profiler performs up to 12 runs!). There are some comments on which events can be combined, but about capability 2 it is especially vague (“The number of counters that can be profiled in a single run depends on the specific counters selected on GPUs with Compute Capability 2.0 or higher.”).
Finally, not all events are supported on all cards (i.e. compute capabilities, see /usr/local/cuda/computeprof/doc/computeprof.html).
Regards,
Markus
Hi Philip,
Thanks for your reply. That really helps.
Visual profiler has some way to calculate more meaningful statistics based from the raw performance counters, such as active warps and L1 cache miss. Is there documents where we can have more information about how to interpret the performance counters information into high level information?
By the way, are the there real performance counters in GPU? Or is it just a simulation technique from Nvidia?
Thanks,
Tuan
Search for “Supported derived statistics” in “/usr/local/cuda/computeprof/doc/computeprof.html”. Most of the derived metrics are explained there.
Hello)
I am trying to optimize my OpenCL kernels and all I have right now is NVidia Visual Profiler,which seems rather constrained. I would like to see line-by-line profile of kernels to better understand issues with coalescing, etc. Is there a way to get more thorough profiling data than the one, provided by Visual Profiler?