How to find out which GPU buffers are modified by CUDA kernels

Hi GPU experts!

Recently I tried to profile my machine learning code written in PyTorch program to see:

  1. Peak GPU memory it totally consumed during training.
  2. The information of GPU buffers are allocated during training (i.e., via cudaMalloc/ cudaMallocAsync), including their starting address, buffer size, etc.
  3. Which of these GPU buffers are modified by CUDA kernels, which remain unmodified;

For 1, I think NVIDIA’s nsys provides a very convenient way to get such information.

For 2 and 3, currently I can’t figure out a good way to do so, my question is:

Is there any tool like nsys to help get such information without modifying the PyTorch source code? (or maybe nsys is able to, but how?)