How can I check and see if my GPU is using L1 cache

MichealHou · May 21, 2011, 3:04am

Hi, everyone,

I have a project to compare the performance of shared memory and L1 cache. However, I don’t know how to check if my Gpu is using L1 cache while computing. I found nothing from the ptx code regarding ‘l1’ or ‘prefetch’… when I compiled the program using ‘nvcc ***.cu -arch=compute_20 -code=sm_20 -ptx’.

I can get the following ptx code to show the shared memory is in use.

ld.global.f64 	%fd6, [%rd28+0];

...

st.shared.f64 	[%rd30+728], %fd6;

But I can’t get ptx code like this when I want to make use of L1 cache.

ld.global.f64 ....

st.l1.f64 ...

or

prefetch.global.l1 ....

Or my understanding about L1 cache is incorrect. I think compiler predicts when the memory access would happen during the kernel execution and then generate ptx code to “prefetch” related data to L1 cache, doesn’t it?

ps: my gpu is m2050.

thanks.

tera · May 21, 2011, 8:59am

L1 cache use is completely transparent (like on a CPU), no code change is needed to use it. And while there is a prefetch instruction in PTX, the compiler does not emit it.

keokeo22 · May 22, 2011, 9:08pm

Excuse, so basically I can or can not use something like

ld.global.f64 ....

st.l1.f64 ...

to explicitly access l1 cache?

hyqneuron · May 23, 2011, 7:06am

ld.global always goes to the L1 cache (I think this would be true even when L1 is disabled). When you have a hit, that’s it. When you have a miss, the L1 then goes to L2 and so on.

tera · May 23, 2011, 8:24am

There is no [font=“Courier New”]st.l1[/font] instruction. If you are looking for explicitly user-managed cache, use shared memory.

edisonying1984 · June 8, 2011, 3:05pm

Hi, everyone,

I have a project to compare the performance of shared memory and L1 cache. However, I don’t know how to check if my Gpu is using L1 cache while computing. I found nothing from the ptx code regarding ‘l1’ or ‘prefetch’… when I compiled the program using ‘nvcc ***.cu -arch=compute_20 -code=sm_20 -ptx’.

I can get the following ptx code to show the shared memory is in use.
ld.global.f64 	%fd6, [%rd28+0];

...

st.shared.f64 	[%rd30+728], %fd6;
But I can’t get ptx code like this when I want to make use of L1 cache.
ld.global.f64 ....

st.l1.f64 ...
or
prefetch.global.l1 ....
Or my understanding about L1 cache is incorrect. I think compiler predicts when the memory access would happen during the kernel execution and then generate ptx code to “prefetch” related data to L1 cache, doesn’t it?

ps: my gpu is m2050.

thanks.

Hi MichealHou,

I’m also struggling on disabling the L1 cache. I have a program including two cuda files (.cu) and a cpp file, and I wanna disable the L1 by adding the flag “-Xptxas -dlcm=cg” in the makefile. This approach works fine when I tested on some applications from the CUDA SDK, but it poses no impact on my code. Even if I set the flag, the performance of my code does not change, and the profiler reports almost identical results for counters like l1_gld_hit and l1_gld_miss. According to my understand, values of these two counters should be zero after the L1 is disabled. I really have no idea on this…

Do you have any clue on this? or do you have similar experience in your previous work?

Thanks

edisonying1984 · June 8, 2011, 3:12pm

Hi hyqneuron,

I’m also sort of confused on the L1 cache. you say the global load accesses always go to the L1 cache. I believe this is the case when L1 is enabled. However, if we disable the L1 by setting the flag “-Xptxas -dlcm=cg”, do the global loads still go through the L1? If the answer is positive as you thought, then what’s the difference after the L1 is disabled?

Thanks

hyqneuron · June 9, 2011, 1:10am

Cacheline size becomes 32 byte instead of 128 byte; When your accesses have little locality, this saves a lot of global mem band width.

Topic		Replies	Views
cannot disable L1 on Fermi CUDA Programming and Performance	0	3736	June 8, 2011
More Shared Memory by disabling L1 Cache? CUDA Programming and Performance	3	1308	February 24, 2013
Fermi: Cache configuration default at compile time From shared to L1 CUDA Programming and Performance	4	19573	April 16, 2010
Turn off L1 caching on Fermi GPUs via the driver API? CUDA Programming and Performance	2	697	September 28, 2011
Bypassing cache in Fermi CUDA Programming and Performance	16	4895	August 28, 2010
Disabling cache on Fermi architectures Try to disable L1 and L2 CUDA Programming and Performance	11	9362	August 30, 2013
L1 Cache, L2 Cache and Shared memory in Fermi CUDA Programming and Performance	5	23650	March 21, 2011
disable L1 cache on Fermi GPU running OpenCL CUDA Programming and Performance	9	4174	September 4, 2011
Fermi L2 cache How fast is the L2 cache? How do I access it? CUDA Programming and Performance	11	26314	December 2, 2011
Cache behavior when loading global data to shared memory in Fermi CUDA Programming and Performance	1	1046	April 30, 2013

How can I check and see if my GPU is using L1 cache

Related topics