Possible to improve the L/TEX1 hitrate?

TrailingStop · February 6, 2024, 6:12am

Hi,

I have a kernel that does random access to the globale memory. I know the position in the memory somewhat ahead of the actual use to the content. Here is the Memory Workload Analysis of the unmodified kernel:

I then add:

prefetch.global.L1

to my kernel to let the system know which address I need next. The result is that the L2 Hit rates improves up to 50%:

I also tried:

prefetch.global.L1

getting the same result. The overall performance of the kernel didn’t change. Is it possible to improve the L1 hitrate if the kernel does random read access to the globale memory?

Thanks a lot.

Sanjiv.Satoor · February 6, 2024, 6:35am

In case you have not seen this - you can refer https://developer.nvidia.com/blog/boosting-application-performance-with-gpu-memory-prefetching/

But this does not address your question regarding improving L1 hit rate.

TrailingStop · February 6, 2024, 6:41am

Hi- not. I missed that. Thanks for the information.

veraj · February 20, 2024, 6:42am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
L1 cache hit rate too low Nsight Compute	0	776	August 5, 2024
Average of all kernels L1, L2 Cache Hit Rate Nsight Compute	8	186	February 20, 2025
What is the expected L1/L2 hit rate for fully coalesced accesses? CUDA Programming and Performance	10	103	January 8, 2025
what is the difference between tex_cache_hit_rate and global_hit_rate? Visual Profiler and nvprof	4	1253	July 3, 2018
L1 and L2 cache hit rate CUDA Programming and Performance	8	6557	February 3, 2016
Why does reducing idle thread improve the performance significantly in reduction? CUDA Programming and Performance cuda , kernel	7	480	August 10, 2023
How to optimize "L2 Load Access Pattern" Nsight Compute	3	1035	January 12, 2024
Visualisation of Integer based Random Memory Access Kernel Nsight Compute	2	102	January 9, 2025
Measuring global memory access speed CUDA Programming and Performance	9	2088	October 25, 2018
How to improve access to global memory? CUDA Programming and Performance	1	535	December 14, 2017

Possible to improve the L/TEX1 hitrate?

Related topics