Profile TX2 cache misses

Hello,

I’m trying to profile the LLC cache misses of the GPU, generated by a CUDA kernel that runs on my Tegra X2.

There are some documentation and tool for performing this test?

thanks

Hi,

You can find the nvprof document here: https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-command-line-options
But please noticed that not all profiling supports Jetson platform since some hardware design issue.

Thanks.

Thanks for the answer,

I read the documentation of nvprof and I think that the right metric for my experiment is the l2_l1_read_hit_rate, but if I run the query metric ( nvprof --query-metrics ), it does not return the l2_l1_read_hit_rate metric.
Then I suppose that it is not possibile to profile L2 cache misses on Jetson TX2, right?

Hi,

Could you also check if NVidia System profiler can meet your requirement?
https://docs.nvidia.com/nvidia-system-profiler/index.html
>> Sampling counters from ARM PMU (Performance Monitoring Unit). Information such as cache misses gets statistically correlated with function execution.

Thanks.