CUDA Memory Usage Management

tnwilly · June 9, 2025, 11:01am

Since the file I plan to read is very large (20 GB), I use multiple streams and asynchronous memory copies to process the data in batches.

I understand that the issue might be related to the CUDA memory pool, which is why using cudaFreeAsync for each batch does not release memory back to the OS.

From the NSight Systems timeline, I can see that memory usage never decreases.
How can I disable the memory pool, or is there a smarter way to reclaim memory?
Thanks!

Curefab · June 9, 2025, 7:07pm

If you process the data in batches, then just reuse the memory instead of allocating new memory for each batch.

tnwilly · June 10, 2025, 2:49am

If I use multiple streams, I guess I still need to allocate memory for each stream.

I tried this API cudaMemPoolTrimTo but nothing different.

Curefab · June 10, 2025, 5:54am

Not sure, why you would not want to allocate memory for each stream. Streams would typically run in parallel and each would need memory.

So normally you would free the memory only at the end of the overall data processing.

You probably could also allocate one large block once and index into different regions by stream. But I do not see, why that would be better.

Topic		Replies	Views
Stream ordered Memory Allocator runs out of memory with multiple streams CUDA Programming and Performance	0	263	February 16, 2024
Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 1 Technical Blog	1	704	September 13, 2024
cudaStream alloc after free result in oom CUDA Programming and Performance	8	157	January 1, 2025
Asynchronous cudaMallocFree/cudaFreeAsync per GPU? CUDA Programming and Performance	1	57	February 3, 2025
how to effectively free large memory allocation CUDA Programming and Performance	8	7841	November 5, 2015
Streams & Malloc/Free CUDA Programming and Performance	2	587	July 10, 2015
GPU stalls due to stream synchronization -- even when idle? CUDA Programming and Performance	3	1243	November 19, 2021
Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 2 Technical Blog	12	1398	September 12, 2023
Need a better buffer management pool for improving performance CUDA Programming and Performance	2	2153	June 6, 2017
Multi-threaded CPU application is not asynchronous when using cudaFree CUDA Programming and Performance	0	693	November 25, 2013

CUDA Memory Usage Management

Related topics