L1 cache hit rate too low

YSAY · August 5, 2024, 9:52am

I am attempting to prefetch instructions into the L1 cache using the prefetch.global.L1 instruction from PTX, but when I inspect the Global Load Hit Rate with Nsight Compute, there is no improvement; it remains at 0.07%. Meanwhile, the L2 cache hit rate has increased from 1% to 37%. What could be the possible reason for this?

I have used the __syncthreads() function between prefetching and loading to ensure that the prefetch operation is completed.