Distribute Processing GPU mem check

Hi. We are solving the case where allocation fails due to GPU out of memory during cudaMalloc.
For this case, the error situation is reproduced only for execution using multiple servers using MessageQueue.
At the time of error, we would like to identify the factors that eat GPU memory.
In a situation like this, how should I debug?