How to collect the event value every time the kernel function been invocated?

When I use the

nvprof --events tex0_cache_sector_queries --replay-mode kernel ./matrixMul

or

nvprof --events tex0_cache_sector_queries --replay-mode application ./matrixMul

, the result is a total summary such as:

==40013== Profiling application: ./matrixMul
==40013== Profiling result:
==40013== Event result:
"Device","Kernel","Invocations","Event Name","Min","Max","Avg","Total"
"Tesla K80 (0)","void matrixMulCUDA<int=32>(float*, float*, float*, int, int)",301,"tex0_cache_sector_queries",0,30,24,7224

Above result is a summary. THe 301 times invocation value of tex0_cache_sector_queries invocated by kernel function matrixMulCUDA. It just has the min, max, avg, total value of the 301 times invocation, that is a summary result.

I want to collect the complete 301 times tex0_cache_sector_queries values which from every time the matrixMulCUDA been invocated. On the other hand, every time the kernel function matrixMulCUDA been invocated, I want to collect the tex0_cache_sector_queries event value. How to collect it?

Hi,

You can use NVIDIA Visual Profiler to check the metric value for each kernel.
Use of this tool can be refered to https://docs.nvidia.com/cuda/profiler-users-guide/index.html#visual

  1. Launch nvvp
  2. create a new session for matrixMul
  3. Run->Collect Metrics and Events
  4. select the metrics that you are interested
  5. In GPU Details(summary)tab, press “Show data for each kernel,memcpy and memset instance”
  6. you’ll get value for each kernel

Hope this helps.

Best Regards
VeraJ

Hi Veraj,

Thank you for your replay.

I run workload with:

nvprof --pc-sampling-period 31 --print-gpu-trace --events tex0_cache_sector_queries --replay-mode application --export-profile application.prof ./matrixMul

Then I import the application.prof file into visual profiler, I can get the complete 301 times tex0_cache_sector_queries values which from every time the matrixMulCUDA been invocated. I also post the problem in ,https://stackoverflow.com/questions/51100850/how-to-collect-the-event-value-every-time-the-cuda-kernel-function-been-invoked/51279362#51279362. And this answer I post a figure of visual profiler.

Thanks for the update.

Hi,

I am using the sbove command while varying the number of cylces for pc sampling…say 2^5, 2^10 etc.

I am expecting to capture event count
at ever 2^5, 2^10 cycles…of matrixMul
execution

Unable to locate these values…

Appreciate any help