Declare area of the on-card memory as non-cacheable? on card memory and it's use.

cdickinson66 · November 4, 2010, 9:42pm

We are using Fermi architecture cards with 700K layer-2 cash.
We would like to declare certain areas of the on-card memory as non-cacheable.
It would be sufficient to just have one contigues non-cacheable region
But if we could define multiple regions it would be better.

Does anyone know if this is possible?

Thanks

Charles

cdickinson66 · November 4, 2010, 9:42pm

We are using Fermi architecture cards with 700K layer-2 cash.
We would like to declare certain areas of the on-card memory as non-cacheable.
It would be sufficient to just have one contigues non-cacheable region
But if we could define multiple regions it would be better.

Does anyone know if this is possible?

Thanks

Charles

tera · November 5, 2010, 12:33am

You can declare variables as volatile to avoid caching.
Using PTX you have finer control adding cache operators to individual load and store instructions.

tera · November 5, 2010, 12:33am

You can declare variables as volatile to avoid caching.
Using PTX you have finer control adding cache operators to individual load and store instructions.

njuffa · November 5, 2010, 4:25am

The “volatile” keyword in C is a modifier that informs the compiler that an object may be modified asynchronously. In practical terms, this mostly serves to restrict certain optimizations that the compiler may otherwise apply. It does not control cacheability (at any level of the cache hierarchy) of that object.

njuffa · November 5, 2010, 4:25am

The “volatile” keyword in C is a modifier that informs the compiler that an object may be modified asynchronously. In practical terms, this mostly serves to restrict certain optimizations that the compiler may otherwise apply. It does not control cacheability (at any level of the cache hierarchy) of that object.

cdickinson66 · November 11, 2010, 5:56pm

So is it impossible? what is PTX?

Thanks for your help.

cdickinson66 · November 11, 2010, 5:56pm

So is it impossible? what is PTX?

Thanks for your help.

tera · November 12, 2010, 12:21am

PTX is the assembler-like intermediate representation that nvcc compiles the code into.

tera · November 12, 2010, 12:21am

PTX is the assembler-like intermediate representation that nvcc compiles the code into.

Sarnath · November 12, 2010, 10:09am

If I were you, I would open up the card and take away all 700K of cash… ;-)

On serious terms, I dont think you can control the L2 cacheability… But as some1 pointed out, may be PTX provides you that control.

Sarnath · November 12, 2010, 10:09am

If I were you, I would open up the card and take away all 700K of cash… ;-)

On serious terms, I dont think you can control the L2 cacheability… But as some1 pointed out, may be PTX provides you that control.

seibert · November 12, 2010, 2:54pm

Right, the current options in CUDA are:

Disable the L1 cache for the entire kernel with an option to nvcc (see the CUDA programming guide appendix G)
Write PTX and use the cache operators in the load and store instructions to control whether L1 is bypassed or not. You cannot skip the L2 cache completely, but you can mark a read or write as “streaming” in PTX, which indicates that the request should be evicted first, as it is unlikely to be reused. There is also a cache operator that marks a read request as volatile, forcing the cache line to be flushed and reloaded before servicing the current request.

(See the ptx_isa_2.2.pdf that comes with the CUDA 3.2 Toolkit for more details.)

seibert · November 12, 2010, 2:54pm

Right, the current options in CUDA are:

Disable the L1 cache for the entire kernel with an option to nvcc (see the CUDA programming guide appendix G)
Write PTX and use the cache operators in the load and store instructions to control whether L1 is bypassed or not. You cannot skip the L2 cache completely, but you can mark a read or write as “streaming” in PTX, which indicates that the request should be evicted first, as it is unlikely to be reused. There is also a cache operator that marks a read request as volatile, forcing the cache line to be flushed and reloaded before servicing the current request.

(See the ptx_isa_2.2.pdf that comes with the CUDA 3.2 Toolkit for more details.)

Topic		Replies	Views
Bypassing cache in Fermi CUDA Programming and Performance	16	4782	August 28, 2010
L1 Cache, L2 Cache and Shared memory in Fermi CUDA Programming and Performance	5	23546	March 21, 2011
Custom CPU to GPU ringbuffer CUDA Programming and Performance	21	13765	May 14, 2013
Getting nvcc to consolidate registers CUDA Programming and Performance	19	19520	November 19, 2012
PTX: st.wt versus st.volatile CUDA Programming and Performance	4	13103	August 20, 2011
Improve performance using volatile CUDA Programming and Performance	8	7052	July 15, 2011
Better control of register use CUDA Programming and Performance	4	1877	July 1, 2009
Force a variable to be stored in a Register Is there any way to ensure a variable CUDA Programming and Performance	13	8991	May 21, 2010
How to optimize for cache + shared memory on Fermi? CUDA Programming and Performance	8	3043	April 25, 2010
Disabling cache on Fermi architectures Try to disable L1 and L2 CUDA Programming and Performance	11	9261	August 30, 2013

Declare area of the on-card memory as non-cacheable? on card memory and it's use.

Related topics