Is there any way to diagnose GPU memory fragmentation & defrag nvidia GPU RAM?

opengpu · December 20, 2023, 8:29am

is there any way to diagnose GPU memory fragmentation & defrag nvidia GPU RAM?
Thanks!

MarkusHoHo · December 20, 2023, 9:07am

I am a bit confused. Why would you need to manually defragment Random Access Memory?

opengpu · December 20, 2023, 10:55am

a server program using CUDA or OpenGL or both. esp. test it with a very large count Loop. eg. new and delete Window and other res inside each Loop.
and found many AI program need to do this, eg. a lot of disscusion in the PyTorch forum.
Thanks!

MarkusHoHo · December 20, 2023, 11:28am

Right, that is application and API level memory management, meaning how does python do garbage collection, what are best practices of mem allocation and re-use etc.

There is no such tool that you would call to defragment VRAM if the application does inefficient memory management. The GPU Hardware will try automatically to adjust caching and cache access to physical memory to help with fragmented access, but in the end it falls to the application to optimize.

A good tool to help with that would be NSIGHT Compute which includes extensive memory statistics and guided optimization for CUDA and OpenGL apps.

opengpu · December 20, 2023, 11:30am

Crash and got cudaErrorMemoryAllocation BUT there is enough contiguous GPU memory…any other clue? - CUDA / CUDA Programming and Performance - NVIDIA Developer Forums

opengpu · December 20, 2023, 3:58pm

thanks! and is there any article to notice about avoiding VRAM fragmentations either CUDA or OpenGL? eg. OpenGL wrap the mem management inside and it’s not able to manage it like Vulkan from the application level.
as for CUDA, is there any good lib or tool for this? eg. maybe memory pool?

system · January 3, 2024, 3:58pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.