Performance difference of tensorRT versus nvcaffe+cuDNN

snarky · May 13, 2017, 3:53pm

The jetson-inference repository on GitHub recommends doing net surgery to translate a trained caffe network to tensorRT for doing real-time inference on the Jetson TX2:

github.com

dusty-nv/jetson-inference/blob/master/README.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">

# Deploying Deep Learning
Welcome to our instructional guide for inference and realtime [DNN vision](#api-reference) library for NVIDIA **[Jetson Nano/TX1/TX2/Xavier NX/AGX Xavier/AGX Orin](http://www.nvidia.com/object/embedded-systems.html)**.

This repo uses NVIDIA **[TensorRT](https://developer.nvidia.com/tensorrt)** for efficiently deploying neural networks onto the embedded Jetson platform, improving performance and power efficiency using graph optimizations, kernel fusion, and FP16/INT8 precision.

Vision primitives, such as [`imageNet`](docs/imagenet-console-2.md) for image recognition, [`detectNet`](docs/detectnet-console-2.md) for object detection, [`segNet`](docs/segnet-console-2.md) for semantic segmentation, and [`poseNet`](docs/posenet.md) for pose estimation inherit from the shared [`tensorNet`](c/tensorNet.h) object.  Examples are provided for streaming from live camera feed and processing images.  See the **[API Reference](#api-reference)** section for detailed reference documentation of the C++ and Python libraries. 

<img src="https://github.com/dusty-nv/jetson-inference/raw/dev/docs/images/deep-vision-primitives.jpg">

Follow the [Hello AI World](#hello-ai-world) tutorial for running inference and transfer learning onboard your Jetson, including collecting your own datasets and training your own models.  It covers image classification, object detection, semantic segmentation, pose estimation, and mono depth.

### Table of Contents

* [Hello AI World](#hello-ai-world)
* [Video Walkthroughs](#video-walkthroughs)
* [API Reference](#api-reference)
* [Code Examples](#code-examples)
* [Pre-Trained Models](#pre-trained-models)

This file has been truncated. show original

The same repository also includes instructions for building nvcaffe with 16-bit cuDNN support:

github.com

dusty-nv/jetson-inference/blob/master/docs/building-nvcaffe.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">

# Building nvcaffe

A special branch of caffe is used on TX1 which includes support for FP16.<br />
The code is released in NVIDIA's caffe repo in the experimental/fp16 branch, located here:
> https://github.com/nvidia/caffe/tree/experimental/fp16

#### 1. Installing Dependencies

``` bash
$ sudo apt-get update -y
$ sudo apt-get install cmake -y

# General dependencies
$ sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev \
libhdf5-serial-dev protobuf-compiler -y
$ sudo apt-get install --no-install-recommends libboost-all-dev -y

# BLAS

This file has been truncated. show original

I’m not using one of the three pre-defined models in tensorRT; I’m using a custom network topology that I’ve already trained.
Would there be a performance benefit from translating this caffe model to tensorRT to run inference on the Jetson, or would I see approximately the same performance using nvcaffe?

(My network is fully convolutional, and uses the deconvolution layer as part of inference, which may not even be supported in tensorRT yet?)

snarky · May 13, 2017, 4:28pm

So I compiled nvcaffe and tried to load my model. Apparently, although Caffe has several years of maturity, nvcaffe doesn’t …

Error parsing text-format caffe.NetParameter: 264:14: Message type "caffe.LayerParameter" has no field named "crop_param".

agupta2 · February 1, 2018, 8:29pm

Any updates on the performance comparison between caffe and tensorrt ?

Topic		Replies	Views
Can I build and use nvcaffe on TX2? Jetson TX2	7	1808	October 18, 2021
Optimize caffemodel to run faster on Jetson TX2 TensorRT	1	1134	March 26, 2019
Optimize caffemodel to run faster on Jetson TX2 Jetson TX2	4	858	October 18, 2021
JetPack 2.3 with TensorRT Doubles Jetson TX1 Deep Learning Inference Technical Blog	9	318	February 20, 2017
Performance statistics of Jetson Nano on deep learning inference Jetson Nano	7	3635	October 18, 2021
How to increase inference speed on JETSON NANO (4GB) Jetson Nano opencv , jetson-inference , deep-learning	5	2502	October 15, 2021
about deploying the caffe model inside Jeston TX2 using TensorRT TensorRT	6	1097	November 19, 2019
Should I really Start Computer Vision with Jetpack For Learning Opencv? TensorRT tensorrt , camera	1	302	June 19, 2023
FCN Segmentation: Imported model with Tensorrt gives significantly worse results than pycaffe TensorRT	6	1244	October 12, 2021
Deploying Deep Neural Networks with NVIDIA TensorRT Technical Blog	17	693	October 8, 2017

Performance difference of tensorRT versus nvcaffe+cuDNN

Related topics