CUDA profiling Extract the number of clock cycles of a CUDA application execution

I’m realizing a university project and I need to extract the number of clock cycles needed to execute a CUDA application (I need the total amount of clock cycles, CPU + GPU). So I instrumented the source code in order to retrieve this information.

Basically, what I do is the following:

– clock cycles for CPU –

  • when the program starts the clock() function is called (in order the retrieve the initial clock cycle)
  • before a CUDA kernel invokation, the clock() function is re-called. Comparing the values returned by this 2 invocations of the clock function I can estimate the number of clock cycles used by the CPU, up to this moment.
  • after the CUDA kernel function execution, I call cutilDeviceSynchronize() and then I call the clock function, in order to count the CPU clock cycles until the next CUDA kernel function. So on till the end (I collect clock cycles number between CUDA kernel functions).

– clock cycles for GPU –

  • when the kernel function start I call the clock() function and I collect the value returned
  • before kernel function returns, I call the clock() function and I collect the value returned
  • then I take the smaller clock value (so the start time of the first thread executed) and the bigger clock value (so the end time of the last thread terminated) and in this way I extract the number of cycles for kernel function execution on the GPU.

At the end I sum the total clock cycles of CPU and GPU.

Is this a correct way to retrieve the information I need?

I tried to use also the Compute Visual Profiler tool. In particular I was interested in the following 2 counters:

  • active cycles
  • SM activity

With this 2 information I can extract the elapsed clock cycles (according to the documentation) as: (100 * active_cycles)/SM_activity

In this way I can compare the results collected with my method, with the results provided by the visual profiler.

The problem is that I don’t succeed in retrieving the information regarding SM activity. When I try to set the profiler counters in the session settings menu, there is not a SM activity counter. The documentation says that I need a GPU with compute capability equal or greater than 2.0, and the GPU that my university provided to me is a GTX 470 (c.c. 2.0), so there should be no problems.

Any suggestions to retrieve SM activity value with Compute Visual Profiler? Or to retrieve the information regarding the elapsed clock cycles?

Thank you!


Why don’t you try GPGPU-Sim?

I didn’t know it. I’ll give a look.
Thank you.