How to force NVIDIA OpenCL to release GPU context to avoid memory leak

I am debugging a memory leak problem in my OpenCL program mmc (https://github.com/fangq/mmc), and after some extensive tests, it looks like such memory leak does not happen on Intel, AMD or open source POCL drivers, only appears on NVIDIA drivers.

Please see the detailed tests/benchmarks/example scripts from these two Stackoverflow questions:

https://stackoverflow.com/questions/61163373/how-to-force-nvidia-opencl-to-release-gpu-context-to-avoid-memory-leak
https://stackoverflow.com/questions/61091039/opencl-clcreatecontextfromtype-function-results-in-memory-leaks

The memory profiling results can be found in this plot:

I am wondering if there is any API in NVIDIA OpenCL that can guarantee the driver to completely release the memory from a device after the simulation is completed. I’ve already called clFinish and other clRelease??? functions, but about 300-400MB memory is lost per every simulation cycle.

The commands to reproduce this issue is

git clone https://github.com/fangq/mmc.git
cd mmc/src
sed -i -e 's/mmc_init_from_cmd/for(int i=0;i<5;i++){\nmmc_init_from_cmd/g' mmc.c
sed -i -e 's/return/getchar();}\nreturn/g' mmc.c
make clean
make all
cd ../examples/validation
../../src/bin/mmc -f cube2.inp -G 1 -s cube2 -n 1e4 -b 0 -D TP -M G -F bin

your help is appreciated!