TensorRT-Nano reproducing benchmark numbers

It might be a dumb question, but I want to confirm once.

Its about the setup to reproduce the numbers given on Nano’s homepage. (https://devblogs.nvidia.com/wp-content/uploads/2019/03/imageLikeEmbed.png).
Questions are:

  1. Highest performance mode (nvvp model) + jetson_clocks.sh should be on?

  2. ‘layer network time’ from builtin profiler gives summation of time taken by all network layers. for example, From jetson-inference sample code, for imagenet-console code:

[TRT]  layer res5c_branch2b + res5c_branch2b_relu - 0.911198 ms
[TRT]  layer res5c_branch2c + res5c + res5c_relu - 0.709218 ms
[TRT]  layer pool5 - 0.056250 ms
[TRT]  layer fc1000 input reformatter 0 - 0.005157 ms
[TRT]  layer fc1000 - 0.716250 ms
[TRT]  layer prob input reformatter 0 - 0.008073 ms
[TRT]  layer prob - 0.017812 ms
[TRT]  layer network time - 35.391460 ms
class 0479 - 0.087574  (car wheel)
class 0581 - 0.076086  (grille, radiator grille)
class 0751 - 0.123500  (racer, race car, racing car)
class 0817 - 0.688826  (sports car, sport car)

Is this time, 35.3 ms, used to find the ultimate fps or throughput number? Time taken to load serialised network and other miscellaneous timings are not considered?

I have gone through the documentation (https://docs.nvidia.com/deeplearning/sdk/pdf/TensorRT-Best-Practices.pdf) and found it extremely helpful. It says, while calculating throughput, timings are considered starting from input data that is already present on the GPU, until all network outputs are available on the GPU.

so, am I getting 1000/35.3 = 28.32 fps?
Thanks in advance…


1. YES. And make sure your power supply is strong enough.

2. YES. The benchmark results only consider the TensorRT inference time.


Hi BMohit, see here for the instructions on running the SSD-MobileNet-v2 benchmark: https://devtalk.nvidia.com/default/topic/1049802/jetson-nano/object-detection-with-mobilenet-ssd-slower-than-mentioned-speed/post/5327974/#5327974

In general, trt-exec program should be used for benchmarking instead of jetson-inference, as jetson-inference only makes 1 processing iteration and does full pre/post processing (outside of the network) and uses TensorRT synchronous API, whereas trt-exec will average the time over a number of runs for you to report a more accurate figure. jetson-inference is also meant for ease-of-use and understandability of the source.