When Multiple devices are used in an application, cudaFreeHost(*ptr) is not clearing the allocated CUDA host memory. The CUDA host memory is allocated with flag cudaHostAllocPortable.
For Example: I am using 2-GPU devices. If I set the device 0 and allocate some memory space in host, then switch to device 1 using cudaSetDevice(1) and create any stream or host memory or device memory and then switch back to device(0) and try to free the allocated host memory, the memory is not released(I am checking this in “Task Manager->Performance tab->Mermory section”.).
This problem also exists, if i create a host memory in Device-0 and create a seperate thread and set it to device-1 then create a stream or memory for device 1 in new thread. The host memory in main thread is not released(i did cudaSetDevice(0) before freeing memory).
The function cudaFreeHost() returns cudaSuccess. cudaGetLastError() also returns cudaSuccess.
The CUDA sample application “simpleMultiGPU” also responds in same way, as the call “cudaFreeHost(plan[i].h_Data)” does not seems to clear the data.
I also tried the cuda-memcheck. It does not show any error.
Am i missing anything, in MultiGPU Programming?