Hi GPU experts!
Recently I tried to profile my machine learning code written in PyTorch program to see:
- Peak GPU memory it totally consumed during training.
- The information of GPU buffers are allocated during training (i.e., via
cudaMalloc
/cudaMallocAsync
), including their starting address, buffer size, etc. - Which of these GPU buffers are modified by CUDA kernels, which remain unmodified;
For 1, I think NVIDIA’s nsys
provides a very convenient way to get such information.
For 2 and 3, currently I can’t figure out a good way to do so, my question is:
Is there any tool like nsys
to help get such information without modifying the PyTorch source code? (or maybe nsys
is able to, but how?)
Thanks!