NVIDIA-SMI shows 111M gpu memory used, after cudaFreeHost release the memory created by cudaHostAlloc

Device:Tesla P4
OS:CentOS 7.5
Version:cuda-9.2

Firstly, call cudaHostAlloc ( void** pHost, size_t size, unsigned int flags ) with flag cudaHostAllocDefault
the size may be one of 8K, 128M 512M,whatever!!

then cudaFreeHost() release the memory created by cudaHostAlloc;
before the process exit, via the NVIDIA-SMI I found 111M gpu memory was used.
after the process exit no gpu memory was used.

I wonder why 111M gpu memory was used, even if I call cudaFreeHost to free the page-locked memory?

Thans!

CUDA and the CUDA context established by the process uses some overhead GPU memory. This memory usage does not disappear until the context is destroyed, which normally does not happen until CPU process termination. This is the reason for the extra 111M GPU memory usage.