Hi,
I have a kernel that does random access to the globale memory. I know the position in the memory somewhat ahead of the actual use to the content. Here is the Memory Workload Analysis of the unmodified kernel:
I then add:
prefetch.global.L1
to my kernel to let the system know which address I need next. The result is that the L2 Hit rates improves up to 50%:
I also tried:
prefetch.global.L1
getting the same result. The overall performance of the kernel didn’t change. Is it possible to improve the L1 hitrate if the kernel does random read access to the globale memory?
Thanks a lot.