It might be a dumb question, but I want to confirm once.
Its about the setup to reproduce the numbers given on Nano’s homepage. (https://devblogs.nvidia.com/wp-content/uploads/2019/03/imageLikeEmbed.png).
Highest performance mode (nvvp model) + jetson_clocks.sh should be on?
‘layer network time’ from builtin profiler gives summation of time taken by all network layers. for example, From jetson-inference sample code, for imagenet-console code:
[TRT] layer res5c_branch2b + res5c_branch2b_relu - 0.911198 ms [TRT] layer res5c_branch2c + res5c + res5c_relu - 0.709218 ms [TRT] layer pool5 - 0.056250 ms [TRT] layer fc1000 input reformatter 0 - 0.005157 ms [TRT] layer fc1000 - 0.716250 ms [TRT] layer prob input reformatter 0 - 0.008073 ms [TRT] layer prob - 0.017812 ms [TRT] layer network time - 35.391460 ms class 0479 - 0.087574 (car wheel) class 0581 - 0.076086 (grille, radiator grille) class 0751 - 0.123500 (racer, race car, racing car) class 0817 - 0.688826 (sports car, sport car)
Is this time, 35.3 ms, used to find the ultimate fps or throughput number? Time taken to load serialised network and other miscellaneous timings are not considered?
I have gone through the documentation (https://docs.nvidia.com/deeplearning/sdk/pdf/TensorRT-Best-Practices.pdf) and found it extremely helpful. It says, while calculating throughput, timings are considered starting from input data that is already present on the GPU, until all network outputs are available on the GPU.
so, am I getting 1000/35.3 = 28.32 fps?
Thanks in advance…