CUDAFreeHost() not clearing allocated host memory, when multiple devices are used.

Knight1025 · September 11, 2019, 2:33pm

When Multiple devices are used in an application, cudaFreeHost(*ptr) is not clearing the allocated CUDA host memory. The CUDA host memory is allocated with flag cudaHostAllocPortable.

For Example: I am using 2-GPU devices. If I set the device 0 and allocate some memory space in host, then switch to device 1 using cudaSetDevice(1) and create any stream or host memory or device memory and then switch back to device(0) and try to free the allocated host memory, the memory is not released(I am checking this in “Task Manager->Performance tab->Mermory section”.).

This problem also exists, if i create a host memory in Device-0 and create a seperate thread and set it to device-1 then create a stream or memory for device 1 in new thread. The host memory in main thread is not released(i did cudaSetDevice(0) before freeing memory).

The function cudaFreeHost() returns cudaSuccess. cudaGetLastError() also returns cudaSuccess.

The CUDA sample application “simpleMultiGPU” also responds in same way, as the call “cudaFreeHost(plan[i].h_Data)” does not seems to clear the data.

I also tried the cuda-memcheck. It does not show any error.

Am i missing anything, in MultiGPU Programming?

Robert_Crovella · September 13, 2019, 11:07pm

You may wish to file a bug report. A similar report may be here:

[url]https://devtalk.nvidia.com/default/topic/1063080/cuda-programming-and-performance/pinned-memory-cannot-be-freed-on-one-of-multi-gpus/[/url]

That thread indicates how to file a bug report.

michal7kf6j · November 13, 2019, 12:51pm

We have the same problem. Here is a simple repro-code :

for (int i = 0; i < 10; i++ )
{
checkCudaErrors( cudaSetDevice( i % 2 ) );

float *ptr;
unsigned int flags = cudaHostAllocDefault;
checkCudaErrors( cudaHostAlloc( (void **)&ptr, 1024*1024*1024, flags ) );
checkCudaErrors( cudaFreeHost( ptr ) );

}

If you have two GPUs then after you run this code the memory is not deallocated.

It has to do something with swithching the device context. If you commnet this line :
//checkCudaErrors( cudaSetDevice( i % 2 ) );
then the memory will be deallocated.

It is definitelly a bug on the CUDA side as the cudaFreeHost should end up with an error message at least.

NVIDIA … please fix it ASAP! Our customers are blocked as the actual release of our software became unusable with the latest driver because of this bug!