I saw this feature here: https://developer.nvidia.com/blog/dynamic-memory-compression/
is there a way to implement this more globally or outside of AI/LLM frameworks?
I have an application that I think might benefit from this as its heavily bound by memory bandwidth/transfers. but i can’t see a way to implement without using these customized LLMs with it built in already. my application is scientific compute, not anything with AI/LLMs.