Hi GPU experts!
Recently I tried to profile my machine learning code written in PyTorch program to see:
- Peak GPU memory it totally consumed during training.
- The information of GPU buffers are allocated during training (i.e., via
cudaMallocAsync), including their starting address, buffer size, etc.
- Which of these GPU buffers are modified by CUDA kernels, which remain unmodified;
For 1, I think NVIDIA’s
nsys provides a very convenient way to get such information.
For 2 and 3, currently I can’t figure out a good way to do so, my question is:
Is there any tool like
nsys to help get such information without modifying the PyTorch source code? (or maybe
nsys is able to, but how?)