The device_ptr returned by cudaExternalMemoryGetMappedBuffer needs to be released using cudaFree

xu.jun36 · February 21, 2024, 8:02am

The device_ptr returned by cudaExternalMemoryGetMappedBuffer needs to be released using cudaFree , which may impact program runtime performance. However, cudaFreeAsyncoperation is not supported . Is there a good way to solve this problem?

AastaLLL · February 21, 2024, 9:06am

Hi,

cudaExternalMemoryGetMappedBuffer maps a pre-allocated buffer.
~~So cudaFree or cudaFreeAsync should depend on the way the buffer is allocated.~~

Do you observe something different?

Thanks.

xu.jun36 · February 21, 2024, 9:39am

cudaExternalMemory_t ext_mem;

cuda_err = cudaImportExternalMemory(&ext_mem, &memHandleDesc);
CHK_CUDA_STATUS_AND_RETURN(“cudaImportExternalMemory”, cuda_err);

cudaExternalMemoryBufferDesc bufferDesc;
memset(&bufferDesc, 0, sizeof(bufferDesc));
bufferDesc.size = size;
bufferDesc.offset = 0;

void* device_ptr_for_map;
cuda_err = cudaExternalMemoryGetMappedBuffer(&device_ptr_for_map, ext_mem, &bufferDesc);
CHK_CUDA_STATUS_AND_RETURN(“cudaExternalMemoryGetMappedBuffer”, cuda_err);
cuda_err = cudaDestroyExternalMemory(ext_mem);
CHK_CUDA_STATUS_AND_RETURN(“cudaDestroyExternalMemory”, cuda_err);
cudaStream_t stream;
cudaStreamCreate(&stream);
// cuda_err = cudaFree(device_ptr_for_map);
cuda_err = cudaFreeAsync(device_ptr_for_map,stream);
CHK_CUDA_STATUS_AND_RETURN(“cudaFree”, cuda_err);

cudaStreamSynchronize(stream);

cudaStreamDestroy(stream);

The above is my code snippet. If I use the cudaFree function, it will be impacted by other GPU task threads, resulting in cudaFree taking 30ms; if I use cudaFreeAsync, it will throw error 801.

xu.jun36 · February 26, 2024, 8:44am

The returned pointer from cudaExternalMemoryGetMappedBuffer must be explicitly deallocated using cudaFree , as it is not user-allocated.

AastaLLL · February 27, 2024, 6:43am

Hi,

Yes, sorry for the incorrect message before.
The pointer needs to be released with cudaFree.

You can find this limitation in our document as well:

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXTRES__INTEROP.html#group__CUDART__EXTRES__INTEROP_1g6f1cd4a939374a83267bb580a0ea07ae

Description
…
The returned pointer devPtr must be freed using cudaFree.

Thanks.

system · March 27, 2024, 3:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nvbuf_utils cant free dmabuf when imported by cudaImportExternalMemory Jetson TX2 cuda	7	2262	September 27, 2021
Asynchronous problem with cudaMalloc CUDA Programming and Performance	2	1052	May 22, 2023
Multi-threaded CPU application is not asynchronous when using cudaFree CUDA Programming and Performance	0	698	November 25, 2013
Ambiguity in the description of cudaFree API? CUDA Programming and Performance cuda	3	483	April 1, 2024
Can cudaFreeAsync be used to free unified memory allocated with cudaMallocManaged? CUDA Programming and Performance cuda	2	64	April 26, 2025
cudaFree does not free memory on Kepler CUDA Programming and Performance	2	2342	June 20, 2012
about latency to free device memory CUDA Programming and Performance	3	5601	February 18, 2008
Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 1 Technical Blog	1	723	September 13, 2024
cudaFree in parallel with CUDA kernel CUDA Programming and Performance	1	4228	December 29, 2010
Asynchronous cudaMallocFree/cudaFreeAsync per GPU? CUDA Programming and Performance	1	71	February 3, 2025

The device_ptr returned by cudaExternalMemoryGetMappedBuffer needs to be released using cudaFree

Related topics