Hi! I am learning paper published by NVIDIA: AUTOSCRATCH: ML-OPTIMIZED CACHE MANAGEMENT FOR INFERENCE-ORIENTED GPUS
Like here:
I am wondering how this is implemented? Or does anyone know how shared activation memory buffer is implemented in TensorRT?? Thank you!!!

