Performance statistics of Jetson Nano on deep learning inference

I would be interested in finding out how the speed of deep learning inference on the Jetson Nano in the Nvidia blog(https://devblogs.nvidia.com/jetson-nano-ai-computing/) about the Jetson Nano(https://devblogs.nvidia.com/wp-content/uploads/2019/03/imageLikeEmbed-1024x510.png) can be reproduced.

For example: SSD Mobilenet SSD-V2(300x300) on the Jetson Nano performs at 39 fps which is faster than the TensorRT performance on the Jetson TX2 I have access to which performs at about 20 fps(this is similar in performance as the benchmarks(https://github.com/NVIDIA-AI-IOT/tf_trt_models#models-1) listed in the Nvidia tf_trt_repository.

The review on Phoronix ranks Jetson Nano deep learning inference performance consistently below Jetson TX2 performance: https://www.phoronix.com/scan.php?page=article&item=nvidia-jetson-nano&num=3 . This seems about right as the Nano has an older Maxwell architecture with half the amount of CUDA cores

Are there any new TensorRT optimizations on the Nano? How can the performance statistics from the Nvidia blog be reproduced?

Hi,

Please use pure TensorRT rather than TF-TRT for benchmark.
And the result should generate with the caffe-based model.

Thanks.

Thanks for your reply. I will try TensorRT. Is there any performance reason for using Caffe? I prefer Tensorflow because I have difficulties getting the Mobilenets to converge in Caffe.

Hi,

You should be able to get similar performance result with pure TensorRT.

TensorRT starts from caffemodel so we keep using it to compare with our previous score.
And this should not yield too much difference since they all convert into TensorRT in the end.

Another reason is that caffemodel is NCHW format which is more friendly to GPU.
Thanks.

Hello,
The fps numbers from (https://devblogs.nvidia.com/wp-content/uploads/2019/03/imageLikeEmbed-1024x510.png) are generated while running Jetson Nano in maximum performance mode with jetson_clocks.sh ?

Hi,

YES. And please set the nvpmodel to the performance mode first.

sudo nvpmodel -m 0

Thanks.

See here for the instructions on running SSD-Mobilenet-v2 with TensorRT:

https://devtalk.nvidia.com/default/topic/1049802/jetson-nano/object-detection-with-mobilenet-ssd-slower-than-mentioned-speed/post/5327974/#5327974