Understanding the meaning of localMemoryPoolUtilizedSize in CUDA with cudaMallocAsync and cudaFreeAsync

I am trying to optimize my code using cudaMallocAsync and cudaFreeAsync .
After profiling with Nsight Systems, it appears that these operations are using the local memory pool.
While cudaMemPoolTrimTo is able to release the localMemoryPoolSize , the localMemoryPoolUtilizedSize continues to increase.

My question is, “What does localMemoryPoolUtilizedSize represent?”

As an experiment, I checked the GPU memory using the resource monitor in the task manager, but I did not observe any continuous increase in memory usage, similar to what was observed with the localMemoryPoolUtilizedSize in the profiling results.