How to improve py-faster-caffe performance on JTX1?

jkjung13 · May 1, 2017, 3:37am

I followed AastaLLL’s instruction in the ‘Caffe failed with py-faster-rcnn demo.py on TX1’ post, and was able to build and run py-faster-rcnn demo script on Jetson TX1.

[url]https://devtalk.nvidia.com/default/topic/974063/jetson-tx1/caffe-failed-with-py-faster-rcnn-demo-py-on-tx1/post/5010194/#5010194[/url]

However, comparing to a GeForce GPU card, the inference performance on JTX1 is lackluster. More specifically, it takes roughly 1.8s to processing each image in the py-faster-rcnn demo on JTX1. In contrast, that takes only ~0.09s on my x64 PC with a GTX-1080 graphics card. I have tried to force JTX1 to always run at maximum clock speeds by running the ~/jetson_clocks.sh script, but that doesn’t help too much. For other DNN/CNN tasks, I typically see less than 10X performance difference between JTX1 and the GTX-1080 PC. But for this py-faster-rcnn case, JTX1 is really falling behind.

Are there any suggestions about how to improve py-faster-caffe inference performance on JTX1? Thanks.

Screenshot of JTX1 case:
External Media

Screenshot of GTX-1080 case:
External Media

snarky · May 1, 2017, 11:27pm

The Jetson eMMC disk I/O is much slower than a typical desktop SSD.
Maybe the problem is at least partly in the load-and-setup part of the code, rather than the inference part of the code?
If I were you, I’d instrument the code to time itself, once the model is fully loaded and ready to run.
The reason for this is that, in a finished embedded system, you’d load the model just once, but you’d repeatedly run the inference as a service to the rest of the system.

jkjung13 · May 2, 2017, 8:14am

The py-faster-rcnn demo.py script does load the model only once, and then runs inference on 5 test images consecutively. In fact, it runs inference on 2 dummy images before those 5. On the other hand, I doubt disk I/O would cause problem in this case since the demo script only loads the 5 jpg files for testing, each roughly 100kB in size.

Anyway, thanks for the suggestion. I might really need to do profiling on the code then…

AastaLLL · May 2, 2017, 9:48am

Hi,

Thanks for your question.

We will check this issue and update information to you later.

snarky · May 2, 2017, 4:24pm

Where does the actual model come from then? I would assume it’s many megabytes of data, being loaded from disk, in addition to the test images.

jkjung13 · May 3, 2017, 1:13am

The inference tasks (on 5 test images) are timed after the model has been loaded and the 2 dummy images been processed. So model loading time should not factor in.

AastaLLL · May 3, 2017, 9:49am

Hi,

We have evaluated the performance of official VGG-16 model.
(Since roi-pooling is not supported by tensorRT, we use VGG official model instead.)

TensorRT: 97.64ms
Caffe(faster-RCNN branch): 400.62ms

As a result, please use tensorRT for better performance.
If you are interested in detection problem, it’s recommended to use detectNet.

Sample and introduction can be found here:
Jetson_inference: GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
DetectNet: https://devblogs.nvidia.com/parallelforall/detectnet-deep-neural-network-object-detection-digits/

Topic		Replies	Views
Jetson TX1 performance reduced. Jetson TX1	6	1906	October 18, 2021
Optimize caffemodel to run faster on Jetson TX2 TensorRT	1	1127	March 26, 2019
Performance difference of tensorRT versus nvcaffe+cuDNN GPU-Accelerated Libraries	2	2612	February 1, 2018
Poor Inference Time on Jetson TX1 Jetson TX1 jetson-inference	4	750	June 21, 2022
Optimize caffemodel to run faster on Jetson TX2 Jetson TX2	4	841	October 18, 2021
Running Multiple DetectNets on Jetson TX1 Jetson TX1	2	873	October 18, 2021
Inference slow using nvInfer and TensorRT directly into PX2 General	6	754	April 17, 2019
how to start TensorRT on TX1? Jetson TX1	13	5589	October 18, 2021
Object detection models are very slow Jetson TX2	5	1462	October 18, 2021
TensorRT Optimization for Tensorflow-Unet-Image-segmentation TensorRT tensorrt , tensorflow , nano	1	1172	August 4, 2021

How to improve py-faster-caffe performance on JTX1?

Related topics