Declare area of the on-card memory as non-cacheable? on card memory and it's use.

We are using Fermi architecture cards with 700K layer-2 cash.
We would like to declare certain areas of the on-card memory as non-cacheable.
It would be sufficient to just have one contigues non-cacheable region
But if we could define multiple regions it would be better.

Does anyone know if this is possible?

Thanks

Charles

We are using Fermi architecture cards with 700K layer-2 cash.
We would like to declare certain areas of the on-card memory as non-cacheable.
It would be sufficient to just have one contigues non-cacheable region
But if we could define multiple regions it would be better.

Does anyone know if this is possible?

Thanks

Charles

You can declare variables as volatile to avoid caching.
Using PTX you have finer control adding cache operators to individual load and store instructions.

You can declare variables as volatile to avoid caching.
Using PTX you have finer control adding cache operators to individual load and store instructions.

The “volatile” keyword in C is a modifier that informs the compiler that an object may be modified asynchronously. In practical terms, this mostly serves to restrict certain optimizations that the compiler may otherwise apply. It does not control cacheability (at any level of the cache hierarchy) of that object.

The “volatile” keyword in C is a modifier that informs the compiler that an object may be modified asynchronously. In practical terms, this mostly serves to restrict certain optimizations that the compiler may otherwise apply. It does not control cacheability (at any level of the cache hierarchy) of that object.

So is it impossible? what is PTX?

Thanks for your help.

So is it impossible? what is PTX?

Thanks for your help.

PTX is the assembler-like intermediate representation that nvcc compiles the code into.

PTX is the assembler-like intermediate representation that nvcc compiles the code into.

If I were you, I would open up the card and take away all 700K of cash… ;-)

On serious terms, I dont think you can control the L2 cacheability… But as some1 pointed out, may be PTX provides you that control.

If I were you, I would open up the card and take away all 700K of cash… ;-)

On serious terms, I dont think you can control the L2 cacheability… But as some1 pointed out, may be PTX provides you that control.

Right, the current options in CUDA are:

  • Disable the L1 cache for the entire kernel with an option to nvcc (see the CUDA programming guide appendix G)

  • Write PTX and use the cache operators in the load and store instructions to control whether L1 is bypassed or not. You cannot skip the L2 cache completely, but you can mark a read or write as “streaming” in PTX, which indicates that the request should be evicted first, as it is unlikely to be reused. There is also a cache operator that marks a read request as volatile, forcing the cache line to be flushed and reloaded before servicing the current request.

(See the ptx_isa_2.2.pdf that comes with the CUDA 3.2 Toolkit for more details.)

Right, the current options in CUDA are:

  • Disable the L1 cache for the entire kernel with an option to nvcc (see the CUDA programming guide appendix G)

  • Write PTX and use the cache operators in the load and store instructions to control whether L1 is bypassed or not. You cannot skip the L2 cache completely, but you can mark a read or write as “streaming” in PTX, which indicates that the request should be evicted first, as it is unlikely to be reused. There is also a cache operator that marks a read request as volatile, forcing the cache line to be flushed and reloaded before servicing the current request.

(See the ptx_isa_2.2.pdf that comes with the CUDA 3.2 Toolkit for more details.)