Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 1

jwitsoe · July 27, 2021, 8:46pm

Originally published at: https://developer.nvidia.com/blog/using-the-nvidia-cuda-stream-ordered-memory-allocator-part-1/

Most CUDA developers are familiar with the cudaMalloc and cudaFree API functions to allocate GPU accessible memory. However, there has long been an obstacle with these API functions: they aren’t stream ordered. In this post, we introduce new API functions, cudaMallocAsync and cudaFreeAsync, that enable memory allocation and deallocation to be stream-ordered operations. In part…

jhomola · September 13, 2024, 4:13pm

Is it possible to make the cudaMallocAsync work even if there is not enough free memory left? Instead of an allocation error, just “wait” (block the stream) until enough memory is available?

In the scenario below, the amount of memory could be an issue. But I can guarantee that the memory will be available eventually, so the cudaMallocAsync could just “wait” for some of the previous cudaFreeAsync.

for(int i = 0; i < 100; i++) {
    void * ptr;
    cudaMallocAsync(&ptr, a_lot_of_memory[i], streams[i]);
    kernel<<<..., streams[i]>>>(ptr, ...);
    cudaFreeAsync(ptr, streams[i]);
}

I understand that there would be some issues with deadlocks, but given some rules, is it possible?

Topic		Replies	Views
Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 2 Technical Blog	12	1232	September 12, 2023
Why would cumemAllocAsync want to "insert new stream dependencies"? CUDA Programming and Performance	9	451	April 8, 2023
Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager Technical Blog	9	940	March 27, 2021
Introducing Low-Level GPU Virtual Memory Management Technical Blog	59	7714	June 4, 2024
11.2 > cudaMemPool_t and Peer2Peer CUDA Programming and Performance	4	1059	January 14, 2021
Memory leak with cudagraph? CUDA Programming and Performance	4	278	June 10, 2024
cudaMemcpyAsync, unexpected behaviour while using cudaStreamNonBlocking? CUDA Programming and Performance	6	2078	May 29, 2018
GPU Pro Tip: CUDA 7 Streams Simplify Concurrency Technical Blog	51	2117	February 5, 2020
Why does cudaStreamAddCallback serialize kernel execution and break concurrency? CUDA Programming and Performance	12	8059	April 5, 2015
malloc memory in kernel linked via in/out variable CUDA Programming and Performance	10	1935	October 17, 2015

Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 1

Related topics