Possible to improve the L/TEX1 hitrate?

Hi,

I have a kernel that does random access to the globale memory. I know the position in the memory somewhat ahead of the actual use to the content. Here is the Memory Workload Analysis of the unmodified kernel:

I then add:

prefetch.global.L1

to my kernel to let the system know which address I need next. The result is that the L2 Hit rates improves up to 50%:

I also tried:

prefetch.global.L1

getting the same result. The overall performance of the kernel didn’t change. Is it possible to improve the L1 hitrate if the kernel does random read access to the globale memory?

Thanks a lot.

In case you have not seen this - you can refer https://developer.nvidia.com/blog/boosting-application-performance-with-gpu-memory-prefetching/

But this does not address your question regarding improving L1 hit rate.

Hi- not. I missed that. Thanks for the information.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.