Hello,
I’m trying to profile the LLC cache misses of the GPU, generated by a CUDA kernel that runs on my Tegra X2.
There are some documentation and tool for performing this test?
thanks
Hello,
I’m trying to profile the LLC cache misses of the GPU, generated by a CUDA kernel that runs on my Tegra X2.
There are some documentation and tool for performing this test?
thanks
Hi,
You can find the nvprof document here: [url]Profiler :: CUDA Toolkit Documentation
But please noticed that not all profiling supports Jetson platform since some hardware design issue.
Thanks.
Thanks for the answer,
I read the documentation of nvprof and I think that the right metric for my experiment is the l2_l1_read_hit_rate, but if I run the query metric ( nvprof --query-metrics ), it does not return the l2_l1_read_hit_rate metric.
Then I suppose that it is not possibile to profile L2 cache misses on Jetson TX2, right?
Hi,
Could you also check if NVidia System profiler can meet your requirement?
https://docs.nvidia.com/nvidia-system-profiler/index.html
>> Sampling counters from ARM PMU (Performance Monitoring Unit). Information such as cache misses gets statistically correlated with function execution.
Thanks.