hardware performance counters


Is there any way to query hardware performance counters regarding OpenCL/CUDA. The NVPerfKit kit counters don’t seem to be relevant to GPU computing. I’ve come across NVIDIA Parallel Nsight. Will that come with an SDK to interact with the hardware for debugging, profiling…

I don’t think there’s an API yet, our own visual profilers use data that is output to a log file when the CUDA_PROFILE environment variable is set to 1.

The PTX specs have some details on our hardware performance counters, although you can only read these from inline assembly at the moment (which itself is not officially supported):

Thx for the link. I noticed that there are breakpoint and trap instructions in PTX. I also ran across CUDA-GDB which must be using these instructions. Is there a public API to execute the PTX code from the host and install callbacks for the debugging instructions. The host App must somehow be notified when a kernel executes a breakpoint and there are probably instructions to step though the code as well.

We would like to add some simple debugging features to our OpenCL editor (at least for NVIDIA platforms), so any pointer in that direction would be really helpful.