disable L1 cache on Fermi GPU running OpenCL

edisonying1984 · March 7, 2011, 4:47pm

Hi everyone,

According to the CUDA programming guide, the L1 cache on a Fermi GPU can be disabled (I’m using a GTX 580) by setting the nvcc compiler flag -Xptxas -dlcm=cg. There is no problem when compiling CUDA code with the flag set, however, if I am using OpenCL, can I do it in the same way? If yes, how shall I do it? I’m not sure if the flag are only available for CUDA compilation at present.

Since I am a newbie to both OpenCL and Fermi, this question might be silly to you. Anyway, any suggestion or clarification is welcome!

Thanks,

David_Black · March 7, 2011, 9:21pm

This would be really useful to check that the performance doesnt drop through the floor with a non fermi devices which haveno cache:-)

David

weliad · March 8, 2011, 8:26pm

If there is such a compiler extension, it is not documented in the “OpenCL Compiler Extenssions” text file, much like the -cl-nv-arch compiler flag.
Let me know if you find it :).

David, there’re more reasons to disable L1 Cache on Fermi. The L1 Cache supports reads in 128bit alignment only. if your kernels are 32 or 64 bit aligned, you might suffer from coalescing issues.

philipjfry · March 9, 2011, 10:03am

I’m afraid this is a misinterpretation. The L1 always performs naturally aligned 128 Byte (not bit!) read accesses to L2! But it supports quite arbitrary accesses from the compute cores (the same bank conflict rules as for shared memory apply, because it’s the same piece of hardware). In contrast to L2, which has a cache line size of 32B, L1 has a cache line of 128B, an every read miss will fetch a whole 128B line, if necessary from global memory. Therefore bypassing L1 is beneficial for memory accesses that are scattered or have a long stride, because a memory access will fetch only 32B from device memory (allocated in L2) instead of 128B (allocated in L1 and most probably in L2).

See also CUDA C programming Guide 3.2, G.4.2.

weliad · March 9, 2011, 4:45pm

Got it, thanks for the clarification.

thanh_tuan · April 15, 2011, 7:47am

Could you clarify a little bit more for me?

I understand that when memory accesses are scattered, and L1 is ON, a block of 128B sized memory will be fetched into L1. But where this data is fetched depends on whether it is cached in L2 or not, right? So in both cases, what is the penalties for having L1 ON? Is it just the overhead of fetching data that is not potential to be reused in future? Anything else?

By the way, sorry to have probably irrelevant question to the topic. Do you know what cache algorithm do they use for L1 and L2 in Fermi? Are they K-way associative caches? Is it possible to find out K?

Thanks.

laughingrice · May 14, 2011, 2:34pm

The L1 cache line size is 128 Byte, so an L1 cache miss that can’t get the full 128 Byte from L2 will fetch 128Byts from global memory. A L2 cache miss as the same access patterns as gt200 coalescing (32, 64 or 128 Bytes). So if you get a lot of L1 misses that also miss at L2 level, you could be better off turning off L1.

I don’t know of a flag that does that, but rumor has it that marking the variable as volatile will cause it to skip L1 and stay at L2 level (at least under CUDA).

Mangpo · August 3, 2011, 7:44pm

I couldn’t find it. Anyone? I would like to try this too.

Sarnath · September 3, 2011, 7:51am

1

laughingrice · September 4, 2011, 5:19am

There is no such compiler extension at the moment. The only thing you can do is mark the pointer as volatile. I did some tests and it does seem to work.

Topic		Replies	Views
cannot disable L1 on Fermi CUDA Programming and Performance	0	3748	June 8, 2011
Disabling cache on Fermi architectures Try to disable L1 and L2 CUDA Programming and Performance	11	9435	August 30, 2013
Reg: Options for changing L1 cache size in OPENCL CUDA Programming and Performance	5	2099	July 2, 2012
way to control Fermi's caches? CUDA Programming and Performance	2	909	April 15, 2011
How can I check and see if my GPU is using L1 cache CUDA Programming and Performance	7	3157	June 9, 2011
How to use the flags to enable\disable L1 Cache of GPU on Windows? CUDA Programming and Performance	1	2098	April 19, 2020
Bypassing cache in Fermi CUDA Programming and Performance	16	4988	August 28, 2010
Turn off L1 caching on Fermi GPUs via the driver API? CUDA Programming and Performance	2	720	September 28, 2011
GMEM loads: caching vs. non-caching Legacy PGI Compilers	4	5387	February 10, 2014
Fermi: Cache configuration default at compile time From shared to L1 CUDA Programming and Performance	4	19611	April 16, 2010

disable L1 cache on Fermi GPU running OpenCL

Related topics