I am trying to optimize my code using cudaMallocAsync
and cudaFreeAsync
.
After profiling with Nsight Systems, it appears that these operations are using the local memory pool.
While cudaMemPoolTrimTo
is able to release the localMemoryPoolSize
, the localMemoryPoolUtilizedSize
continues to increase.
My question is, “What does localMemoryPoolUtilizedSize
represent?”
As an experiment, I checked the GPU memory using the resource monitor in the task manager, but I did not observe any continuous increase in memory usage, similar to what was observed with the localMemoryPoolUtilizedSize
in the profiling results.