Originally published at: https://developer.nvidia.com/blog/using-the-nvidia-cuda-stream-ordered-memory-allocator-part-1/
Most CUDA developers are familiar with the cudaMalloc and cudaFree API functions to allocate GPU accessible memory. However, there has long been an obstacle with these API functions: they aren’t stream ordered. In this post, we introduce new API functions, cudaMallocAsync and cudaFreeAsync, that enable memory allocation and deallocation to be stream-ordered operations. In part…
Is it possible to make the cudaMallocAsync work even if there is not enough free memory left? Instead of an allocation error, just “wait” (block the stream) until enough memory is available?
In the scenario below, the amount of memory could be an issue. But I can guarantee that the memory will be available eventually, so the cudaMallocAsync could just “wait” for some of the previous cudaFreeAsync.
for(int i = 0; i < 100; i++) {
void * ptr;
cudaMallocAsync(&ptr, a_lot_of_memory[i], streams[i]);
kernel<<<..., streams[i]>>>(ptr, ...);
cudaFreeAsync(ptr, streams[i]);
}
I understand that there would be some issues with deadlocks, but given some rules, is it possible?