Profile pytorch model using NCU


I have established a pytorch model and wanna to profile each layer or operator to get memory occupation, pcie bandwidth and GPU utils and so on when making inference. How can I do that using Nsight Compute or is there some available methods?


TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered
1 Like


Please refer the following docs,

If you still need further assistance we will move this post to Nsight related forum.

Thank you.