I have a project to compare the performance of shared memory and L1 cache. However, I don’t know how to check if my Gpu is using L1 cache while computing. I found nothing from the ptx code regarding ‘l1’ or ‘prefetch’… when I compiled the program using ‘nvcc ***.cu -arch=compute_20 -code=sm_20 -ptx’.
I can get the following ptx code to show the shared memory is in use.
ld.global.f64 %fd6, [%rd28+0]; ... st.shared.f64 [%rd30+728], %fd6;
But I can’t get ptx code like this when I want to make use of L1 cache.
ld.global.f64 .... st.l1.f64 ...
Or my understanding about L1 cache is incorrect. I think compiler predicts when the memory access would happen during the kernel execution and then generate ptx code to “prefetch” related data to L1 cache, doesn’t it?
ps: my gpu is m2050.