How to use createpolicy ptx instruction well in CUDA? Are there any practical examples as reference?

jimmy.hj · March 27, 2023, 8:56am

Hello everyone,

I noticed that the libcudacxx library has implemented the extensions for cuda::access_property and the corresponding PTX instructions have also been defined.
Users can create and apply different L2 evict policy attributes to complement memory access instructions such as ld, st, and cp.async, to more fine-grained cache behavior.
The policies are divided into three modes: range base, fraction base, and compatible with L2 Persistence Cache, and the granularity of access control for memory blocks can range from 512KB to 4GB.
But, I have not yet seen any use cases for this flexible usage in programs. Can you tell me in which applications this type of usage can have big performance impact? I understand that there must have been some consideration when designing GPU Arch and PTX ISA.

Thank you!

Robert_Crovella · March 27, 2023, 1:36pm

The A100 whitepaper indicates possible use cases (on pp. 40-41):

For example, for DL inferencing workloads, ping-pong buffers can be persistently cached in the
L2 for faster data access, while also avoiding writebacks to DRAM. For producer-consumer
chains, such as those found in DL training, L2 cache controls can optimize caching across the
write-to-read data dependencies. In LSTM networks, recurrent weights that are shared across
multiple GEMM operations can be preferentially cached and reused in L2.

On page 66 there is a performance comparison for a simple test case (histogramming).

Topic		Replies	Views
L1 Cache, L2 Cache and Shared memory in Fermi CUDA Programming and Performance	5	23598	March 21, 2011
How can I check and see if my GPU is using L1 cache CUDA Programming and Performance	7	3008	June 9, 2011
CUDA: How do I use L2 cache in Fermi? Legacy PGI Compilers	3	5416	June 22, 2011
Fermi L2 cache How fast is the L2 cache? How do I access it? CUDA Programming and Performance	11	26252	December 2, 2011
Declare area of the on-card memory as non-cacheable? on card memory and it's use. CUDA Programming and Performance	13	8648	November 12, 2010
Turn off L1 caching on Fermi GPUs via the driver API? CUDA Programming and Performance	2	679	September 28, 2011
CUDA L2 cache use CUDA Programming and Performance	1	826	June 4, 2015
Fermi cache performance L1 vs L2 cache CUDA Programming and Performance	0	785	May 1, 2010
Why L1 cache hit ratio become zero on K20? CUDA Programming and Performance	10	5666	January 17, 2013
Bypassing cache in Fermi CUDA Programming and Performance	16	4827	August 28, 2010

How to use createpolicy ptx instruction well in CUDA? Are there any practical examples as reference?

Related topics