Why are there min, max, average values for flop_count_sp or flop_count_hp metrics?



I’m wondering why the flop_count_xx metrics also have separate min, max, average values using nvprof. The number of floating-point operations should be the same each time the kernel is called no? I can understand the kernel runtime may varies along the calls but I don’t see why the number of operations is varying?

Could you please clarify this?



TensorRT Version: 7.1.3
GPU Type:
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2
CUDNN Version:
Operating System + Version: ubuntu 18.04
Python Version (if applicable): python 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered


Could you please post your concern on Nsight Systems - NVIDIA Developer Forums to get better help.

Thank you.