We are using Fermi architecture cards with 700K layer-2 cash.
We would like to declare certain areas of the on-card memory as non-cacheable.
It would be sufficient to just have one contigues non-cacheable region
But if we could define multiple regions it would be better.
We are using Fermi architecture cards with 700K layer-2 cash.
We would like to declare certain areas of the on-card memory as non-cacheable.
It would be sufficient to just have one contigues non-cacheable region
But if we could define multiple regions it would be better.
You can declare variables as volatile to avoid caching.
Using PTX you have finer control adding cache operators to individual load and store instructions.
You can declare variables as volatile to avoid caching.
Using PTX you have finer control adding cache operators to individual load and store instructions.
The “volatile” keyword in C is a modifier that informs the compiler that an object may be modified asynchronously. In practical terms, this mostly serves to restrict certain optimizations that the compiler may otherwise apply. It does not control cacheability (at any level of the cache hierarchy) of that object.
The “volatile” keyword in C is a modifier that informs the compiler that an object may be modified asynchronously. In practical terms, this mostly serves to restrict certain optimizations that the compiler may otherwise apply. It does not control cacheability (at any level of the cache hierarchy) of that object.
Disable the L1 cache for the entire kernel with an option to nvcc (see the CUDA programming guide appendix G)
Write PTX and use the cache operators in the load and store instructions to control whether L1 is bypassed or not. You cannot skip the L2 cache completely, but you can mark a read or write as “streaming” in PTX, which indicates that the request should be evicted first, as it is unlikely to be reused. There is also a cache operator that marks a read request as volatile, forcing the cache line to be flushed and reloaded before servicing the current request.
(See the ptx_isa_2.2.pdf that comes with the CUDA 3.2 Toolkit for more details.)
Disable the L1 cache for the entire kernel with an option to nvcc (see the CUDA programming guide appendix G)
Write PTX and use the cache operators in the load and store instructions to control whether L1 is bypassed or not. You cannot skip the L2 cache completely, but you can mark a read or write as “streaming” in PTX, which indicates that the request should be evicted first, as it is unlikely to be reused. There is also a cache operator that marks a read request as volatile, forcing the cache line to be flushed and reloaded before servicing the current request.
(See the ptx_isa_2.2.pdf that comes with the CUDA 3.2 Toolkit for more details.)