When I use the
nvprof --events tex0_cache_sector_queries --replay-mode kernel ./matrixMul
or
nvprof --events tex0_cache_sector_queries --replay-mode application ./matrixMul
, the result is a total summary such as:
==40013== Profiling application: ./matrixMul
==40013== Profiling result:
==40013== Event result:
"Device","Kernel","Invocations","Event Name","Min","Max","Avg","Total"
"Tesla K80 (0)","void matrixMulCUDA<int=32>(float*, float*, float*, int, int)",301,"tex0_cache_sector_queries",0,30,24,7224
Above result is a summary. THe 301 times invocation value of tex0_cache_sector_queries invocated by kernel function matrixMulCUDA . It just has the min, max, avg, total value of the 301 times invocation, that is a summary result.
I want to collect the complete 301 times tex0_cache_sector_queries values which from every time the matrixMulCUDA been invocated. On the other hand, every time the kernel function matrixMulCUDA been invocated, I want to collect the tex0_cache_sector_queries event value . How to collect it?
veraj
July 2, 2018, 2:39am
2
Hi,
You can use NVIDIA Visual Profiler to check the metric value for each kernel.
Use of this tool can be refered to https://docs.nvidia.com/cuda/profiler-users-guide/index.html#visual
Launch nvvp
create a new session for matrixMul
Run->Collect Metrics and Events
select the metrics that you are interested
In GPU Details(summary)tab, press “Show data for each kernel,memcpy and memset instance”
you’ll get value for each kernel
Hope this helps.
Best Regards
VeraJ
Hi Veraj,
Thank you for your replay.
I run workload with:
nvprof --pc-sampling-period 31 --print-gpu-trace --events tex0_cache_sector_queries --replay-mode application --export-profile application.prof ./matrixMul
Then I import the application.prof file into visual profiler, I can get the complete 301 times tex0_cache_sector_queries values which from every time the matrixMulCUDA been invocated. I also post the problem in ,https://stackoverflow.com/questions/51100850/how-to-collect-the-event-value-every-time-the-cuda-kernel-function-been-invoked/51279362#51279362 . And this answer I post a figure of visual profiler.
ksp463
November 23, 2022, 5:16am
6
Hi,
I am using the sbove command while varying the number of cylces for pc sampling…say 2^5, 2^10 etc.
I am expecting to capture event count
at ever 2^5, 2^10 cycles…of matrixMul
execution
Unable to locate these values…
Appreciate any help