Nsys or nsight-cu-cli, how to get metrics


I would like to profile the CPU utilization and GPUs of a python MPI application.
I need to measure at least

  • the initialization time
  • the transfer from CPU to GPU
  • the time needed to allocate memory on GPU
  • the overlap between transfer and compute (if the compute start before the whole data are sent to the GPU)
  • the freeing time at the end of the execution
  • the memory used by the CPU and the GPU

From what I read, I thought that nsys is the best tool but I cannot extract these metrics.
Can you tell me what I have to use?


Sorry for the delay. What you will want to do is use the “nsys stats” command to extract statistics from a sqlite representation of the data. You will need to be using Nsys 2020.2 (or 2020.3 when it is released), please check “nsys stats --help” for details.