Hi! I am learning paper published by NVIDIA: AUTOSCRATCH: ML-OPTIMIZED CACHE MANAGEMENT FOR INFERENCE-ORIENTED GPUS Like here: [image] I am wondering how this is implemented? Or does anyone know how shared activation memory buffer is implemented in TensorRT?? Thank you!!! [image]

How L2 persistant slices combines with shared activation memory buffer?

202476410arsmart December 12, 2023, 11:05am 1

Hi! I am learning paper published by NVIDIA: AUTOSCRATCH: ML-OPTIMIZED CACHE MANAGEMENT FOR INFERENCE-ORIENTED GPUS

Like here:

I am wondering how this is implemented? Or does anyone know how shared activation memory buffer is implemented in TensorRT?? Thank you!!!

Topic		Replies	Views
How to build "shared activation memory buffer" in L2 cache? CUDA Programming and Performance	0	300	December 10, 2023
How to optimize for cache + shared memory on Fermi? CUDA Programming and Performance	8	3165	April 25, 2010
Register Cache: Caching for Warp-Centric CUDA Programs Technical Blog	3	677	January 31, 2024
Texture Unit in Pascal architecture CUDA Programming and Performance	24	4305	April 11, 2018
How to use createpolicy ptx instruction well in CUDA? Are there any practical examples as reference? CUDA Programming and Performance cuda	1	614	March 27, 2023
L1 Cache, L2 Cache and Shared memory in Fermi CUDA Programming and Performance	5	23725	March 21, 2011
Is it possible to use L1 cache instead of shared memory when implementing blocked matmuls in CUDA CUDA Programming and Performance	4	1485	June 18, 2023
What type of memory am I using with my code? Jetson TX2 tensorrt	6	650	October 18, 2021
Newbie - Need to use shared mem? CUDA Programming and Performance	27	15243	December 17, 2008
How to organize memory for inference with batch size > 1 TensorRT	1	822	June 4, 2020