Ambiguity in the description of cudaFree API?

xd_cuda · December 8, 2023, 4:15am

In the description of “cudaFree” API there is a note:

Note - This API will not perform any implicit synchronization when the pointer was allocated with cudaMallocAsync or cudaMallocFromPoolAsync. Callers must ensure that all accesses to the pointer have completed before invoking cudaFree. For best performance and memory reuse, users should use cudaFreeAsync to free memory allocated via the stream ordered memory allocator.

Here it is just mentioned no implicit synchronization will be done for pointers allocated from cudaMallocAsync or cudaMallocFromPoolAsync; however, for pointers allocated from other API like cudaMalloc it doesn’t explicitly say anything. Moreover, it is unclear whether the requirement in the following sentence is for any cudaFree call or just for a call with pointers allocated from cudaMallocAsync or cudaMallocFromPoolAsync.

Even though the result I got from a simple test case shows there seems to be an implicit sync inside cudaFree call for pointers allocated from cudaMalloc it will be much better for the document to provide more accurate description on the API behavior, especially for such a most frequently used API.

Robert_Crovella · December 8, 2023, 7:39pm

You may wish to file a bug.

Yuki_Ni · April 1, 2024, 8:26am

[Public] Hi xiaoping.duan ,

We are glad to let you know we will indicate
" For all other pointers, this API may perform implicit synchronization."
in the cudaFree API description in future releases . (probably next second 12.x release after latest 12.4 )

Thanks again for filing a bug ticket and your patience .

Best,
Yuki

system · April 15, 2024, 8:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Asynchronous problem with cudaMalloc CUDA Programming and Performance	2	1036	May 22, 2023
Can cudaFreeAsync be used to free unified memory allocated with cudaMallocManaged? CUDA Programming and Performance cuda	2	53	April 26, 2025
What's the cudaMalloc's implicit synchronize means? CUDA Programming and Performance	0	61	June 17, 2025
Implicit synchronization CUDA Programming and Performance	6	3734	April 30, 2015
Looping kernel calls Unspecified launch error on cudaFree() ?? CUDA Programming and Performance	5	1785	May 13, 2009
Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 1 Technical Blog	1	704	September 13, 2024
Multi-threaded CPU application is not asynchronous when using cudaFree CUDA Programming and Performance	0	693	November 25, 2013
The impact of cudaMalloc(）and cudaFree() on the overlapping of kernel executions and data transfer CUDA Programming and Performance	0	1020	July 22, 2020
cudaFree painfully slow CUDA Programming and Performance	4	4644	January 29, 2010
The device_ptr returned by cudaExternalMemoryGetMappedBuffer needs to be released using cudaFree Jetson AGX Orin cuda	5	488	February 27, 2024

Ambiguity in the description of cudaFree API?

Related topics