Unified Shared Memory hangup the system when process is terminated


I am using Unified Shared Memory with cudaMallocManaged to allocate 8GBytes of memory, this operation is pretty fast but when I try to kill the process while processing a long kernel, the whole system hang-up for some minutes (ex. 3-4 mins) to deallocate the memory.

I moved to cudaMalloc and cudaMemcpy to avoid this problem during the development, in the “old” way the app is terminated almost immediately.

Do you know why this happens?

Thank you

My system:
CUDA 11.6 Toolkit with driver 511.65
RTX 3060 12GB
Intel NUC 8th i5 with 16GB RAM DDR4
Visual Studio 2022 Community
Windows 10 version 10.0.19044.1741