I’m wondering why the flop_count_xx metrics also have separate min, max, average values using nvprof. The number of floating-point operations should be the same each time the kernel is called no? I can understand the kernel runtime may varies along the calls but I don’t see why the number of operations is varying?
Could you please clarify this?
TensorRT Version: 7.1.3
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2
Operating System + Version: ubuntu 18.04
Python Version (if applicable): python 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered